Published on

Why Postgres FDW Made My Queries Slow (and How I Fixed It)

Authors
  • avatar
    Name
    Mendy
    Twitter

Cover image

Svix is the enterprise ready webhooks sending service. With Svix, you can build a secure, reliable, and scalable webhook platform in minutes. Looking to send webhooks? Give it a try!

I recently needed to run some complex queries using Postgres Foreign Data Wrappers (FDWs) and got hit by quite a few gotchas. The biggest one: I assumed my queries would run on the remote (FDW) Postgres, but they actually ran on our Postgres instead. The planner couldn't push the work to remote Postgres, so my "remote" query pulled massive amounts of data across the network and did all the heavy lifting on our Postgres.

I'm writing this post to document the pitfalls I hit and help you build a mental model for what can go wrong. I'll focus on understanding the problems and share some solutions that worked for me.

The Core Mental Model: Pushdown Is Conditional

First, let me explain "pushdown". When we query through an FDW, the planner decides which operations (WHERE filters, JOINs, ORDER BY clauses) can execute on the remote Postgres versus on our Postgres. When operations push down, only the final results travel across the network. When they don't, our Postgres ends up pulling back raw data for processing, which means loading far more data than we actually need.

The planner is conservative about pushdown. It only ships work to the remote Postgres when it can guarantee you'll get exactly the same results as if it ran on your Postgres. If the planner has any doubt about behavior differences, the operation runs on your Postgres and pulls data (from remote Postgres) as needed.

What determines whether something can ship? Functions, operators, types, collations, and plan structure all matter. Your WHERE clause might push down, but your ORDER BY might not. The only way to know for sure is to run EXPLAIN and look for your operations inside the "Remote SQL" section.

Common Pushdown Misses

Here are the most common cases where pushdown fails. I hit all of these at one point or another.

Time-dependent expressions are the most frequent problem. I use filters like WHERE created_at >= now() or date_trunc('day', today) constantly. The planner won't push these down because it can't guarantee the remote Postgres clock matches the clock on our Postgres.

ORDER BY and LIMIT seem basic, but they don't always push down. If your WHERE clause can't push down, then LIMIT can't push down either. This is because the remote Postgres can't know what to limit until after filtering. Even when WHERE does push down, sorts and limits can still stay on our Postgres due to non-shippable expressions in the ORDER BY, collation mismatches, or plan structures that place those operations above a step executed on our Postgres.

Joins get even trickier. If your tables live on different foreign servers, the join runs on our Postgres—no way around it. Even when both tables are on the same server, joins can stay on our Postgres if your WHERE conditions use volatile functions, custom functions, or if there are collation or type mismatches.

Aggregates and DISTINCT are hit or miss. Simple aggregates like COUNT(*) usually push down fine. But DISTINCT or DISTINCT ON with complex expressions or collation differences often don't.

Extension and custom functions won't ship even if you mark them IMMUTABLE. The FDW has no way to know if that function exists on remote Postgres with identical behavior, so it refuses to push it.

The "Remote Join Becomes Our Postgres Join" Trap

This one scared me the most when I hit it. When a join can't push down, our Postgres pulls both tables in their entirety and joins them on our Postgres. I expected to get back a few thousand rows. Instead, I fetched millions of rows across the network.

Poor or missing statistics make this worse. Without accurate stats on foreign tables, the planner picks terrible join orders and badly misestimates selectivity. A query that looks harmless can turn into a massive operation on our Postgres that often times out.

Poor Query Planning Is Always a Risk

This problem never goes away. You need to stay vigilant about query planning with FDWs because our Postgres planner doesn't know your remote data. Estimates for foreign tables are rough guesses at best. These misestimates lead to bad join orders, over-fetching, and missed pushdown opportunities.

A small query change can suddenly start pulling gigabytes of data. What worked yesterday might grind to a halt today.

Network and Operational Surprises

Even when you get pushdown working, you'll run into operational issues.

Round trips and fetch size matter a lot. The fetch_size option controls how many rows FDW fetches from remote Postgres in each batch (the default is 100 rows). When your filters or joins don't push down, you move tons of data over the wire. A small fetch_size means more network round trips and slower queries; a very large value means fewer round trips but higher memory usage on both servers. You need to find the right balance for your workload.

What to Look for in EXPLAIN

I learned to always check where the work actually happens. This became my most important debugging habit.

Here's a simplified example of pushdown failure I ran into. I wrote this query expecting it to filter on remote Postgres:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, created_at
FROM global_fdw."user"
WHERE created_at >= date_trunc('day', now())
ORDER BY created_at DESC
LIMIT 100;

Here's what the planner did:

Limit
  Output: "user".id, "user".created_at
  -> Foreign Scan on global_fdw."user"
       Output: "user".id, "user".created_at
       Filter: ("user".created_at >= date_trunc('day'::text, now()))
       Remote SQL: SELECT id, created_at FROM public."user" ORDER BY created_at DESC NULLS FIRST

See the problem? The Filter line shows the WHERE clause happens on our Postgres. The Remote SQL doesn't include any WHERE clause at all—it just fetches and sorts all the rows. Our Postgres pulled back the entire table, sorted remotely, then filtered on our Postgres and limited to 100 rows. This query should have returned 100 rows but instead transferred potentially millions.

The issue is now() is a volatile function. The planner won't push it down because it can't guarantee the remote Postgres clock matches the clock on our Postgres.

Here are the red flags I watch for in EXPLAIN output:

A Sort or Limit on our Postgres above a Foreign Scan means you're sorting or limiting on our Postgres after pulling the data.

A Hash Join, Merge Join, or Nested Loop on our Postgres combining two Foreign Scan nodes where the "Remote SQL" shows no join—that means the join happens on our Postgres.

A Filter line outside the "Remote SQL" means you're pulling unfiltered data and filtering on our Postgres.

If the Remote SQL doesn't show your WHERE, JOIN, ORDER BY, or LIMIT, assume our Postgres does that work.

A Few Non-Prescriptive Tips

Keep your queries simple and composable. Use EXPLAIN (VERBOSE, COSTS OFF) to confirm pushdown at each step.

If your estimates are consistently off, try setting use_remote_estimate=true on foreign tables. This makes the planner query remote Postgres for statistics. It adds overhead but can improve plan quality.

For large data transfers, consider setting fetch_size=10000 on the foreign server. The default is conservative. Tune it based on your network and memory constraints.

The Mental Model to Take Away

My mental model now: FDW only helps when the planner can ship the heavy work to remote Postgres. I always read the plan, look for "Remote SQL", and treat work on our Postgres as expensive until I prove otherwise.


If you have questions or want to share your own FDW war stories, come find us on X (formerly Twitter) or join our community Slack.