EXPLAIN ANALYZE is PostgreSQL's primary tool for understanding why a query is slow. It executes the query and shows the actual runtime statistics alongside the planner's estimates, revealing where time was spent, how many rows were processed, and how much I/O was involved. Reading the output accurately is a prerequisite for diagnosing and fixing query performance.
Syntax and Options
-- Show the query plan without executing (estimated only)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;
-- Execute and show actual runtime statistics
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
-- Most useful form for performance diagnosis
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 42;
-- Full diagnostic: timing, buffers, verbose column output, WAL stats, planner settings
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS, WAL) SELECT * FROM orders WHERE customer_id = 42;
-- Machine-readable JSON output for visualizer tools
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) SELECT * FROM orders WHERE customer_id = 42;
EXPLAIN ANALYZE actually executes the statement. For DML, wrap it in a transaction you roll back:
BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'shipped' WHERE id = 1;
ROLLBACK;
Options Reference
| Option | Default | Description |
|---|---|---|
ANALYZE |
off | Execute the query; show actual rows, actual time, loops |
BUFFERS |
off | Show block-level I/O: cache hits, disk reads, dirtied, written, temp file usage |
VERBOSE |
off | Show column lists, schema-qualified names, function detail in trigger nodes |
SETTINGS |
off | Show planner GUCs that differ from their built-in defaults (e.g., work_mem, enable_seqscan) |
WAL |
off | Show WAL record generation: record count, full page images, bytes (requires ANALYZE) |
TIMING |
on | Per-node actual startup and execution time; disable with TIMING OFF to reduce overhead when only row counts are needed |
SUMMARY |
off | Append planning time and execution time totals at the bottom |
MEMORY |
off | Show memory consumed during planning (PostgreSQL 17+) |
GENERIC_PLAN |
off | Show a generic plan with parameter placeholders; cannot combine with ANALYZE (PostgreSQL 16+) |
FORMAT |
TEXT | Output format: TEXT, JSON, XML, YAML |
JSON format is the richest and what visualizer tools consume. Use FORMAT JSON when capturing plans for tooling.
Reading the Output
The output is a tree of plan nodes, indented with -> arrows. Child nodes execute before their parents — the innermost (deepest-indented) nodes run first, and their output flows up to parent nodes.
Nested Loop (cost=4.65..118.50 rows=10 width=488) (actual time=0.017..0.051 rows=10 loops=1)
-> Bitmap Heap Scan on orders t1 (cost=4.36..39.38 rows=10 width=244) (actual time=0.009..0.017 rows=10 loops=1)
Recheck Cond: (customer_id < 10)
-> Bitmap Index Scan on orders_customer_id_idx (cost=0.00..4.36 rows=10 width=0) (actual time=0.004..0.004 rows=10 loops=1)
Index Cond: (customer_id < 10)
-> Index Scan using products_pkey on products (cost=0.29..7.90 rows=1 width=244) (actual time=0.003..0.003 rows=1 loops=10)
Index Cond: (id = t1.product_id)
Cost Notation
(cost=X..Y rows=Z width=W)
- X (startup cost): Estimated cost before the node can produce its first row. A
Sortnode has a high startup cost because it must read all input before returning anything. ASeq Scanhas 0.00 because rows start flowing immediately. - Y (total cost): Estimated cost to produce all output rows. Costs are in abstract internal units — one unit roughly equals one sequential disk page read (
seq_page_cost = 1.0). They are not milliseconds. - Z (rows): Estimated number of output rows.
- W (width): Estimated average byte width per output row.
Actual Statistics
(actual time=A..B rows=C loops=D) — shown when ANALYZE is used.
- A: Actual startup time in milliseconds (time to first row).
- B: Actual total time in milliseconds to produce all rows from this node.
- C: Average rows returned per loop.
- D: Number of times this node was executed.
When loops > 1, actual time and rows are per-loop averages. To get totals:
Total actual time = actual_time_per_loop × loops
Total rows from node = actual_rows × loops
Example: actual time=0.003..0.003 rows=1 loops=10 means the node ran 10 times, total time is 0.03 ms, total rows is 10. This is critical when diagnosing Nested Loop joins — a fast per-execution time becomes catastrophically slow at 100,000 loops.
Scan Node Types
Seq Scan
Reads the entire table sequentially, page by page.
Seq Scan on orders (cost=0.00..9821.00 rows=49215 width=44) (actual time=0.012..892.430 rows=492150 loops=1)
Filter: (status = 'pending')
Rows Removed by Filter: 2008000
Rows Removed by Filter is the key signal: this query read 2.5 million rows to return 492,150. A high ratio here (many rows removed for each row returned) is a candidate for indexing.
Sequential scans are appropriate for large fractions of the table or small tables. They indicate a problem when selecting a small percentage of a large table.
Index Scan
Traverses the B-tree index to find matching row pointers, then fetches each row from the heap via random I/O.
Index Scan using orders_customer_id_idx on orders (cost=0.43..8.45 rows=10 width=44)
Index Cond: (customer_id = 42)
Index Cond means the condition is evaluated within the index itself — only matching rows are fetched from the heap. Best for selective queries returning a small fraction of the table (typically under 5–10%).
Index Only Scan
All needed columns are in the index; no heap fetch is required (unless the visibility map indicates pages need visibility checks).
Index Only Scan using orders_customer_status_idx on orders
Index Cond: (customer_id = 42)
Heap Fetches: 0
Heap Fetches: 0 means the visibility map was clean — no heap access at all. This is the most I/O-efficient scan type. Achievable by adding all query columns to the index using INCLUDE:
CREATE INDEX orders_customer_status_idx ON orders (customer_id) INCLUDE (status, amount);
Bitmap Index Scan + Bitmap Heap Scan
A two-phase operation for medium-selectivity queries. Phase 1 (Bitmap Index Scan) builds an in-memory bitmap of matching page/row locations. Phase 2 (Bitmap Heap Scan) reads matching pages in physical order to minimize random I/O.
Bitmap Heap Scan on orders (cost=214.23..8932.10 rows=9821 width=44)
Recheck Cond: (status = 'pending')
Heap Blocks: exact=4821
-> Bitmap Index Scan on orders_status_idx (cost=0.00..211.78 rows=9821 width=0)
Index Cond: (status = 'pending')
Recheck Cond is always displayed but is only actually re-evaluated when the bitmap becomes "lossy" — when the number of matching pages exceeds work_mem capacity, the bitmap degrades from per-row to per-page granularity. If you see Heap Blocks: lossy=N, increasing work_mem may help. Multiple bitmap scans can be combined via BitmapAnd / BitmapOr to use multiple indexes for a single query.
Join Node Types
Nested Loop
For each row in the outer (left) input, scans the inner (right) input for matches. Fast startup, efficient when the outer side produces few rows and the inner side has an index. Extremely slow when the outer side is large.
Nested Loop (cost=0.43..24.89 rows=10 width=88) (actual time=0.031..0.089 rows=10 loops=1)
-> Seq Scan on customers (actual time=0.012..0.013 rows=1 loops=1)
Filter: (id = 42)
-> Index Scan using orders_customer_id_idx on orders (actual time=0.018..0.073 rows=10 loops=1)
Index Cond: (customer_id = 42)
Watch for the inner side with high loops. If the inner node shows actual time=0.09..0.09 rows=1 loops=50000, total inner time is 4,500 ms.
Hash Join
Builds an in-memory hash table from the smaller input, then probes it for each row of the larger input. High startup cost (building the hash table). Excellent for large unsorted inputs with equality join conditions.
Hash Join (cost=1.04..312.39 rows=9821 width=88) (actual time=0.102..4.832 rows=9821 loops=1)
Hash Cond: (orders.customer_id = customers.id)
-> Seq Scan on orders ...
-> Hash (actual time=0.045..0.046 rows=200 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
Batches: 1 — hash table fit in memory, optimal. Batches: N where N > 1 — the hash table spilled to disk in N batches, a strong indicator that work_mem is too low for this query. Increasing work_mem is usually the fix.
Merge Join
Both inputs must be sorted on the join key; PostgreSQL merges them in a single pass. Low execution overhead once sorted, but high startup if sorting is required. Good for large pre-sorted inputs and the only standard join type supporting non-equality conditions.
Merge Join (cost=234.15..567.34 rows=9821 width=88)
Merge Cond: (orders.customer_id = customers.id)
-> Sort (cost=117.08..119.54 rows=983 width=44)
Sort Key: orders.customer_id
-> Index Scan using customers_pkey on customers
Other Important Nodes
Sort
Sort (cost=713.05..713.30 rows=100 width=44) (actual time=2.995..3.002 rows=100 loops=1)
Sort Key: created_at DESC
Sort Method: quicksort Memory: 74kB
The sort method tells you whether the sort happened in memory or on disk:
quicksort Memory: Xkb— in-memory, fast.external merge Disk: Xkb— spilled to disk becausework_memwas insufficient. This is a performance problem. Increasework_memand re-test.top-N heapsort— used with a LIMIT clause; only tracks the N smallest/largest values without sorting everything.incremental sort(PostgreSQL 13+) — input is partially sorted; only groups sharing a common prefix key are re-sorted.
Hash
The inner side of a Hash Join. Shows memory usage and batch count:
Hash (actual time=245.3..245.3 rows=500000 loops=1)
Buckets: 131072 Batches: 8 Memory Usage: 4096kB
Batches: 8 means work_mem was too small and the hash table spilled to disk 8 times. The fix is the same as for Hash Join: increase work_mem.
Aggregate
Groups and aggregates rows. In parallel plans, appears as Partial Aggregate (in workers) and Finalize Aggregate (in the leader).
Gather / Gather Merge
Gather collects results from parallel worker processes in arbitrary order. Gather Merge does the same but preserves sort order, enabling ORDER BY without a final sort step.
Gather (actual time=892.123..892.130 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (actual time=888.232..888.233 rows=1 loops=3)
If Workers Launched < Workers Planned, workers were unavailable — check max_parallel_workers and max_worker_processes.
Materialize
Stores the output of a subplan so it can be re-scanned without re-executing. Used by Nested Loop when the inner side would otherwise be re-evaluated on every outer row.
CTE Scan
Scans a previously-materialized CTE result. Since PostgreSQL 12, CTEs are not always materialized — the optimizer may inline them. Use WITH name AS MATERIALIZED (...) to force materialization.
Row Estimation Errors
Row estimation errors — a large discrepancy between rows=Z (estimated) and rows=C (actual) — are a primary cause of poor plan selection. Under-estimates are usually more dangerous: the planner chooses Nested Loop (designed for small inputs) when a Hash Join is needed.
A 10x or greater mismatch between estimated and actual rows warrants investigation.
Causes
Stale statistics. Autovacuum runs ANALYZE on a schedule, but it may lag on heavily-written tables. Manual fix:
ANALYZE orders;
Low statistics target. By default PostgreSQL samples 100 histogram buckets per column. For skewed distributions, increase it:
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;
Multi-column correlation. PostgreSQL assumes column conditions are statistically independent. A query on WHERE region = 'EU' AND city = 'Berlin' will under-estimate if city implies region. Fix with extended statistics:
CREATE STATISTICS orders_region_city ON region, city FROM orders;
ANALYZE orders;
Function expressions. The planner cannot use column statistics for WHERE lower(email) = 'foo@example.com'. Consider a functional index and reviewing how statistics are gathered for that expression.
Buffer Statistics
With BUFFERS, each node shows block-level I/O:
Buffers: shared hit=36 read=6 dirtied=2 written=1
shared hit=N— blocks found in PostgreSQL's shared buffer cache. Fast, in-memory.read=N— blocks read from disk (or OS page cache). High values indicate I/O pressure.dirtied=N— blocks dirtied by this query (rare for plain SELECT).written=N— dirty blocks evicted from the cache to make room. Indicates buffer pressure.temp read/written— temporary files used by Sort, Hash, or other nodes that spilled to disk. Directly tied towork_mem.
A high read with low hit on the first execution is normal (cold cache). After the first run, a warm cache should show mostly hit. Persistent high read values indicate that the working set does not fit in shared_buffers.
work_mem and Its Impact
work_mem sets the maximum memory for each sort or hash operation within a query. A single complex query may have multiple concurrent sort/hash operations, each with its own work_mem budget. The default is 4 MB — appropriate for many OLTP queries but too small for analytical workloads.
Test changes safely at the session level:
SET work_mem = '256MB';
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders ORDER BY created_at;
RESET work_mem;
Setting work_mem globally requires care: with 100 connections and multiple operations per query, max_connections × work_mem × operations can exhaust RAM. A common pattern is to keep the global default conservative and increase it per-session for analytical queries.
Practical Example: Before and After Indexing
Before
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 42;
Seq Scan on orders (cost=0.00..98210.00 rows=1 width=88) (actual time=14.832..892.341 rows=234 loops=1)
Filter: (customer_id = 42)
Rows Removed by Filter: 4999766
Buffers: shared hit=2841 read=42469
Planning Time: 0.134 ms
Execution Time: 892.398 ms
5 million rows scanned, 42,469 blocks read from disk, 892 ms to find 234 rows. The row estimate (1) vs. actual (234) is also a large mismatch.
Add the Index
CREATE INDEX CONCURRENTLY orders_customer_id_idx ON orders (customer_id);
ANALYZE orders;
After
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 42;
Index Scan using orders_customer_id_idx on orders (cost=0.56..24.89 rows=234 width=88) (actual time=0.041..0.312 rows=234 loops=1)
Index Cond: (customer_id = 42)
Buffers: shared hit=7 read=4
Planning Time: 0.189 ms
Execution Time: 0.341 ms
11 buffer accesses instead of 45,310. Execution time dropped from 892 ms to 0.3 ms.
JIT Compilation
PostgreSQL 11+ compiles query execution code to native machine code using LLVM when estimated cost exceeds jit_above_cost (default: 100,000). When JIT fires, a JIT: section appears at the bottom of the plan:
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.259 ms, Inlining 4.201 ms, Optimization 12.384 ms,
Emission 15.048 ms, Total 32.892 ms
If the JIT Total time is a large fraction of execution time, JIT is adding overhead without benefit — common for short OLTP queries that happen to have a high estimated cost. Raise jit_above_cost or disable JIT for that workload.
auto_explain: Automatic Plan Capture in Production
The auto_explain extension automatically logs EXPLAIN plans for slow queries in production without requiring manual intervention.
# postgresql.conf (requires restart)
shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 1000 # Log plans for queries over 1 second
auto_explain.log_analyze = true # Include actual timing (adds measurement overhead)
auto_explain.log_buffers = true # Include buffer statistics
auto_explain.log_format = json # JSON format for tooling
Warning: log_analyze = true instruments every query with per-node timing, even queries below the threshold. This can add meaningful overhead on high-throughput systems. Use log_timing = false to get row counts without timing overhead, or use sample_rate (a value between 0.0 and 1.0) to capture only a fraction of qualifying queries.
For testing in a single session without restarting:
LOAD 'auto_explain';
SET auto_explain.log_min_duration = 0;
SET auto_explain.log_analyze = true;
-- Run your queries; plans appear in the PostgreSQL log
Visualizer Tools
Pasting raw EXPLAIN output into a visualizer makes it much easier to spot expensive nodes.
| Tool | Notes |
|---|---|
| explain.dalibo.com | The most widely used free web visualizer. Accepts text or JSON. |
| explain.tensor.ru | Very detailed; supports multiple view modes (flowchart, piechart, tilemap). Works with Greenplum, Citus, TimescaleDB. |
| pgMustard | Commercial (free trial). Scores performance issues 1–5 stars and explains them in plain language. Supports all PostgreSQL 17 EXPLAIN options. |
| pganalyze | Commercial SaaS. Integrates plan visualization with pg_stat_statements, index advisor, and alerting. |
For the most complete data in these tools, capture plans as JSON: EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON).
Quick Diagnostic Checklist
When looking at an EXPLAIN ANALYZE output:
- Find the most expensive nodes — look for the highest actual times near the leaves of the tree.
- Check
loops— multiplyactual timeandrowsbyloopsto get true totals. - Compare estimated vs. actual rows — a 10x+ mismatch points to stale statistics or missing extended statistics.
- Look for
Seq Scanwith highRows Removed by Filter— indexing candidate. - Look for
Sort Method: external merge Disk:—work_memis too low for this sort. - Look for
Hash Batches: Nwhere N > 1 —work_memis too low for this hash join. - Check Nested Loop inner side — verify that
actual time × loopsis not unexpectedly large. - Review
Buffers: read=N— high disk reads indicate the working set is not cached.