PostgreSQL Join Strategies: Nested Loop, Hash Join, and Merge Join

PostgreSQL has exactly three physical join algorithms: Nested Loop, Hash Join, and Merge Join. The planner picks one per join node based on estimated cost, using table statistics, available indexes, the join condition, and work_mem. Each algorithm has different time and memory characteristics, so the same logical query can run in milliseconds or minutes depending on which one the planner selects. Knowing how each works tells you which index, sort, or statistics fix will steer the planner toward the cheaper plan.

How the Three Join Algorithms Work

A Nested Loop scans the outer relation once and, for each outer row, probes the inner relation for matches. With no index on the inner side it is a quadratic O(N * M) operation. When the inner side has an index on the join column, the per-row probe becomes a cheap index lookup, and the loop is fast for a small outer relation. Nested Loop uses O(1) extra memory and is the only algorithm that supports non-equality join conditions such as <, >, and range overlaps.

A Hash Join runs in two phases. It builds an in-memory hash table from the smaller input (the "build" or inner side), then scans the larger input (the "probe" side) and looks each row up in the hash table. With a good hash function this is O(N + M) time. Memory is O(size of build side), which is why the planner hashes the smaller relation. Hash Join only works for equality conditions (=). When the build side exceeds the memory budget, PostgreSQL partitions it into multiple batches and spills the overflow to temporary files, processing one batch at a time.

A Merge Join requires both inputs sorted on the join key. It walks the two sorted streams in lockstep, advancing whichever side is behind, like a merge step in merge sort. If a sort is needed it costs O(N log N) per side; if both inputs already arrive sorted - typically from a B-tree index scan on the join column - the merge itself is linear, O(N + M), with negligible extra memory. Merge Join handles equality and is well suited to large, already-sorted inputs.

Comparison: Which Join for Which Workload

Algorithm Best when Time Memory Needs index / sort
Nested Loop Small outer relation, indexed inner side, or non-equi joins O(N) with inner index; O(N * M) without O(1) Index on inner join column for speed; not required
Hash Join Large unsorted inputs, equality join, no useful sort order O(N + M) O(build side); spills to disk past work_mem No index or sort needed
Merge Join Large inputs already sorted on the join key (equi-join) O(N + M) if pre-sorted; O(N log N) per side if sorting O(1) if pre-sorted; O(N) sort buffer otherwise Sorted input (often a B-tree index)

The build-side memory budget for Hash Join is work_mem multiplied by hash_mem_multiplier, which defaults to 2.0 since PostgreSQL 15. So with the default work_mem of 4MB, a hash table can grow to roughly 8MB before batching kicks in. Raising hash_mem_multiplier lets hash joins and hash aggregates use more memory before spilling, without inflating the budget for sorts.

Reading Join Plans in EXPLAIN

Run EXPLAIN (or EXPLAIN ANALYZE to get real timings and row counts) to see which join the planner chose. A Hash Join on an equality condition looks like this:

Hash Join  (cost=9767.51..302691.07 rows=8391852 width=95)
  Hash Cond: (tf.flight_id = f.flight_id)
  ->  Seq Scan on ticket_flights tf  (cost=0.00..153851.52 rows=8391852 width=32)
  ->  Hash  (cost=4772.67..4772.67 rows=214867 width=63)
        ->  Seq Scan on flights f  (cost=0.00..4772.67 rows=214867 width=63)

The smaller flights table is scanned and loaded into the Hash node; the larger ticket_flights table is the probe side. Swapping the join order in the SQL produces the same plan - SQL is declarative, so the planner is free to choose build and probe sides itself.

A non-equality condition forces a Nested Loop, since neither Hash nor Merge Join can evaluate <:

Nested Loop  (cost=0.42..16532324955.82 rows=601044021228 width=95)
  ->  Seq Scan on ticket_flights tf  (cost=0.00..153851.52 rows=8391852 width=32)
  ->  Index Scan using flights_pkey on flights f  (cost=0.42..1253.81 rows=71622 width=63)
        Index Cond: (flight_id < tf.flight_id)

A Merge Join shows a Merge Cond and a sort on whichever side is not already ordered. Here flights arrives pre-sorted from its primary-key index, while ticket_flights needs an explicit Sort:

Merge Join  (cost=1520511.52..1676140.76 rows=8391852 width=95)
  Merge Cond: (f.flight_id = tf.flight_id)
  ->  Index Scan using flights_pkey on flights f  (cost=0.42..8245.57 rows=214867 width=63)
  ->  Materialize  (cost=1520506.91..1562466.17 rows=8391852 width=32)
        ->  Sort  (cost=1520506.91..1541486.54 rows=8391852 width=32)
              Sort Key: tf.flight_id
              ->  Seq Scan on ticket_flights tf  (cost=0.00..153851.52 rows=8391852 width=32)

When a plan misbehaves, the first thing to check with EXPLAIN ANALYZE is whether estimated rows matches actual rows. A large gap means stale or missing statistics, which is the most common reason the planner picks a Nested Loop over a billion-row product or a Hash Join that spills to disk.

Influencing the Planner's Join Choice

The planner is cost-based, so the durable way to change a join is to change the underlying costs rather than override the algorithm. Adding a B-tree index on the join column gives the planner the option of an indexed Nested Loop or a Merge Join with no sort. Running ANALYZE (or letting autovacuum do it) refreshes statistics so estimates track reality. An ORDER BY on the join key, or a query that already needs sorted output, can tip the scales toward Merge Join because the sort is no longer wasted work.

For diagnosis, PostgreSQL exposes per-algorithm toggles: enable_nestloop, enable_hashjoin, and enable_mergejoin. Setting one off in a session does not hard-disable the join - it adds a large cost penalty so the planner avoids it when any alternative exists. Use these to confirm a hypothesis ("would a Merge Join be faster here?"), then fix the root cause and reset them. Leaving them off in production is a trap, because the right choice shifts as data grows.

When a join plan regresses, the symptom usually surfaces as a query that was fast yesterday and slow today, or one that spikes only under load. Catching that shift means watching plan changes and per-query latency over time, not just reading a single EXPLAIN after the fact. Pulse monitors PostgreSQL query and plan behavior continuously, flags when a query's execution plan or join strategy degrades, and points to the statistics or index gap behind it - so a bad join choice gets caught before it becomes an incident.

Frequently Asked Questions

Q: When does PostgreSQL use a Nested Loop join instead of a Hash Join?
A: PostgreSQL picks a Nested Loop when the outer relation is small and the inner side has an index on the join column, making each probe a cheap lookup. It is also the only option for non-equality conditions like < or range overlaps, which Hash Join and Merge Join cannot evaluate.

Q: Why is my Hash Join spilling to disk and how do I stop it?
A: A Hash Join spills when the build (smaller) side exceeds work_mem * hash_mem_multiplier, so PostgreSQL splits it into batches written to temp files. Raising work_mem for the session, or increasing hash_mem_multiplier (default 2.0), lets the hash table stay in memory. Confirm spilling with EXPLAIN ANALYZE, which reports batches and temp-file usage.

Q: What is the difference between Hash Join and Merge Join in PostgreSQL?
A: Hash Join builds an in-memory hash table from the smaller input and probes it, needing no sorted input but using memory proportional to that input. Merge Join walks two inputs already sorted on the join key in lockstep, using almost no memory when the sort order comes from an index. Hash Join wins on unsorted data; Merge Join wins when the inputs are large and already sorted.

Q: How do I force PostgreSQL to use a specific join algorithm?
A: PostgreSQL has no per-query join hints. You can set enable_nestloop, enable_hashjoin, or enable_mergejoin to off in a session to penalize an algorithm, but this is a diagnostic tool, not a production fix. The durable approach is adding an index, refreshing statistics with ANALYZE, or tuning work_mem so the planner chooses the right join on its own.

Q: Why did the planner choose a slow join after my data grew?
A: Join cost estimates depend on row-count statistics. If statistics are stale, the planner can underestimate a relation and pick a Nested Loop that becomes quadratic at scale, or overestimate one and spill a Hash Join to disk. Run ANALYZE and compare estimated versus actual rows in EXPLAIN ANALYZE to find the mismatch.

Q: Does the order of tables in my JOIN clause change which algorithm PostgreSQL uses?
A: No. SQL is declarative, so the planner is free to reorder joins and to choose which relation is the build, probe, outer, or inner side regardless of the written order. Two queries that differ only in join order typically produce the identical plan.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.