PostgreSQL Performance Tuning: A Practical Guide

PostgreSQL performance tuning is the practice of matching server configuration, indexing, and query design to your hardware and workload so the database serves reads and writes from memory and indexes instead of scanning disk. It works on three levels: the cluster configuration in postgresql.conf, the schema and its indexes, and individual queries. The biggest wins usually come from finding the few queries that dominate total execution time, not from blanket parameter changes. This guide is the hub - each section links to a deeper page.

Start With the Query Planner

PostgreSQL is declarative: a SELECT states what data you want, not how to fetch it. The planner decides how, using a cost-based model. It estimates the cost of each candidate plan from table and index statistics collected by ANALYZE, then picks the cheapest. Cost is an abstract number combining estimated disk page reads and CPU work, weighted by parameters like seq_page_cost (default 1.0) and random_page_cost (default 4.0).

Two inputs decide plan quality: statistics and cost parameters. Stale statistics produce bad row-count estimates, which produce bad plans - a planner that thinks a filter returns 10 rows when it returns 10 million will pick a nested loop that runs for minutes. Keep autovacuum running so ANALYZE fires after significant writes, and raise default_statistics_target (default 100) on columns with skewed distributions. The cost parameters tell the planner how your storage behaves; the defaults assume spinning disks where random reads cost 4x sequential reads.

When a query is slow, read its plan before changing anything. EXPLAIN shows the estimated plan; EXPLAIN (ANALYZE, BUFFERS) runs the query and shows actual rows, timing, and how many blocks came from cache versus disk. The signal to look for is a large gap between estimated and actual rows, or a sequential scan over a big table where an index scan was possible. See reading EXPLAIN execution plans and EXPLAIN ANALYZE for how to interpret each node, and join strategies for why the planner picks nested loop, hash, or merge joins.

Find the Slow Queries First

You cannot tune what you have not measured. The pg_stat_statements extension is the starting point: it aggregates every executed query by normalized form and records call count, total time, mean time, and rows. Add it to shared_preload_libraries and restart, then rank by total time, which surfaces queries that are individually fast but run so often they dominate the workload.

-- Queries consuming the most cumulative execution time
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

For queries you have not aggregated yet, set log_min_duration_statement to log anything slower than a threshold (for example 250ms). The auto_explain module goes further and logs the actual plan of slow statements, so you capture the plan that ran in production rather than reconstructing it later. Sort pg_stat_statements by total_exec_time to find what to optimize, then pull the offending statement into EXPLAIN (ANALYZE, BUFFERS). See debugging slow PostgreSQL queries for the full workflow.

Build an Indexing Strategy

Indexes are the highest-leverage tool for read performance, but each one adds write cost and consumes cache space, so add them to serve real query patterns rather than speculatively. PostgreSQL offers several index types, each suited to a different access pattern.

Index type Best for Notes
B-tree (default) Equality and range on scalar columns, sorting Covers =, <, >, BETWEEN, ORDER BY; the default CREATE INDEX
BRIN Very large tables with naturally ordered data Stores min/max per block range; tiny on disk; only useful when physical order tracks the column (e.g. append-only timestamps)
GIN JSONB, arrays, full-text search Indexes elements inside composite values; supports @>, ?, @@ operators
GiST Geometric data, ranges, nearest-neighbor Used by PostGIS and range types
Hash Single-column equality only Rarely worth it over B-tree

A few patterns matter more than the type choice. Multi-column (composite) indexes serve queries that filter on a leading prefix of their columns, so column order follows your WHERE clauses. Covering indexes with INCLUDE let an index satisfy a query without touching the heap, which helps when a query reads only a handful of columns. Partial indexes (WHERE status = 'active') shrink the index to the rows you actually query. And indexes are skipped when a function wraps the column - WHERE date_trunc('day', ts) = ... cannot use a plain index on ts; rewrite the predicate or build an expression index. See creating indexes, how PostgreSQL indexes work, and why an index is not used.

Tune Memory, I/O, and Autovacuum

Configuration tuning is about three resources: memory, I/O, and background maintenance. The table below lists the parameters that move the needle most, with defaults and typical production starting points for a dedicated server. Treat the production column as a starting point to measure against, not a fixed prescription.

Parameter Default Typical production What it affects
shared_buffers 128MB 25-40% of RAM Size of PostgreSQL's buffer cache for table and index pages
effective_cache_size 4GB 50-75% of RAM Planner hint for total cache (shared + OS); biases toward index scans. Allocates no memory
work_mem 4MB 16MB-64MB Memory per sort/hash operation; too low spills to disk, too high risks OOM under concurrency
maintenance_work_mem 64MB 512MB-1GB Memory for VACUUM, CREATE INDEX, ALTER TABLE
random_page_cost 4.0 1.1 on SSD/NVMe Planner's cost of a random page read; lower makes index scans more attractive
effective_io_concurrency 16 200 on SSD Concurrent I/O requests the planner assumes for bitmap heap scans
max_connections 100 100-200 + pooler Hard cap on backends; high values waste memory, use a pooler instead

work_mem is the parameter most often set wrong. It is allocated per operation, not per query or connection, so a query with three sorts can use three times the value, multiplied across concurrent backends. Size it from available memory and expected concurrency rather than maximizing it. shared_buffers past roughly 40% of RAM tends to hurt, because PostgreSQL relies on the OS page cache too and double-buffering wastes memory; a falling cache hit ratio is the signal to revisit it (debugging a low cache hit ratio).

Autovacuum is not optional maintenance you can defer. MVCC leaves dead tuples behind on every UPDATE and DELETE, and autovacuum reclaims them, refreshes statistics, and prevents transaction ID wraparound. On write-heavy tables the default autovacuum_vacuum_scale_factor of 0.2 (vacuum after 20% of rows change) is too lax - lower it per table so vacuum runs more often on smaller deltas. PostgreSQL 17 rewrote vacuum's memory management to use dead-tuple storage that is far more compact, so large tables vacuum faster and respect maintenance_work_mem better. See VACUUM and autovacuum tuning.

Scale With Partitioning, Pooling, and Hardware

When a single table grows past the point where indexes and vacuum keep up, declarative partitioning splits it by range, list, or hash. The payoff is partition pruning: the planner skips partitions that cannot match the query's predicate, so a query scoped to last month reads one partition instead of years of data. Partitioning also makes maintenance cheaper - dropping an old partition is instant compared to a bulk DELETE plus vacuum. It only helps when queries filter on the partition key; queries that span all partitions get no benefit and pay overhead.

PostgreSQL forks a full OS process per connection, so thousands of short-lived connections exhaust memory and scheduler time long before they exhaust query capacity. Put a pooler such as PgBouncer in transaction mode in front of the database so a small pool of server connections serves many clients. This is usually a larger win than raising max_connections. See connection pooling with PgBouncer.

Hardware scaling comes last, after configuration and queries are tuned. Vertical scaling - more RAM to fit the working set, faster NVMe storage, more cores for parallel query - is the simpler path and often enough. Horizontal scaling adds read replicas to spread read load, or sharding to spread writes, at the cost of application complexity and eventual-consistency trade-offs on replicas. Scale up before you scale out unless write throughput on a single node is the hard limit.

Tracking which of these levers actually matters across a live workload means correlating pg_stat_statements, plans, autovacuum activity, and cache metrics over time, usually by hand and after a regression has already hit users. Pulse watches those signals continuously, flags the query or table behind a regression, and recommends a specific change - add this index, raise work_mem, vacuum this table - for a human to approve.

Frequently Asked Questions

Q: How do I find the slowest queries in PostgreSQL?
A: Enable the pg_stat_statements extension and sort by total_exec_time to find the queries consuming the most cumulative time, which catches both individually slow queries and fast-but-frequent ones. For ad hoc capture, set log_min_duration_statement to log queries over a threshold, and use auto_explain to log their plans. Then run EXPLAIN (ANALYZE, BUFFERS) on the worst offenders.

Q: What should I set shared_buffers to in PostgreSQL?
A: Start at roughly 25% of system RAM on a dedicated database server, and consider up to 40% for large-memory machines. Going higher rarely helps because PostgreSQL also uses the operating system page cache, and double-buffering the same pages wastes memory. Changing shared_buffers requires a restart.

Q: Why is PostgreSQL not using my index?
A: Common reasons are a function or cast wrapping the indexed column (making the predicate non-sargable), stale statistics causing the planner to misjudge selectivity, a random_page_cost set too high so sequential scans look cheaper, or a table small enough that a scan is genuinely faster. Run EXPLAIN ANALYZE, then check whether the predicate matches the index and whether estimated rows are close to actual.

Q: What is the difference between work_mem and shared_buffers?
A: shared_buffers is one shared cache of table and index pages for the whole cluster, allocated once at startup. work_mem is allocated per sort or hash operation, so a single query can use it several times and many concurrent queries multiply it - which is why setting work_mem too high can exhaust memory while shared_buffers cannot.

Q: How often should autovacuum run on a busy table?
A: The default triggers a vacuum after 20% of a table's rows change (autovacuum_vacuum_scale_factor = 0.2), which is too infrequent for large or write-heavy tables. Lower the scale factor per table so vacuum processes smaller, more frequent deltas, which keeps bloat down and statistics fresh. Monitor n_dead_tup and last_autovacuum in pg_stat_user_tables.

Q: Should I scale PostgreSQL vertically or horizontally?
A: Scale vertically first - adding RAM, faster NVMe storage, and CPU cores is simpler and avoids the consistency trade-offs of distributed setups. Move to horizontal scaling (read replicas for read load, sharding for write load) only when a single node cannot meet throughput or you need read scaling beyond what one server provides.

Q: Do I need connection pooling with PostgreSQL?
A: Yes, for any application that opens many short-lived connections. PostgreSQL spawns a process per connection, so high connection counts waste memory and CPU. A pooler like PgBouncer in transaction mode lets a small pool of backend connections serve thousands of clients, which usually beats raising max_connections.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.