SQL Query Optimization Techniques

Q: How do I tell why a query is slow before changing anything?

Run EXPLAIN (ANALYZE, BUFFERS) on the query and read the plan bottom-up. Look for sequential scans on large tables, a wide gap between estimated and actual rows (stale statistics), high Rows Removed by Filter , and sorts or hashes spilling to disk. These signals tell you whether the fix is an index, a predicate rewrite, or refreshed statistics.

Q: Why is keyset pagination faster than OFFSET?

OFFSET n reads and discards all n preceding rows on every request, so cost grows with page depth. Keyset pagination filters on the last row's sort key, like WHERE (created_at, id) > (...) , so it seeks directly to the next page and reads only the rows returned, given an index on the sort columns.

Q: How do I find which queries to optimize across a busy database?

Enable pg_stat_statements and sort by total execution time to find the queries consuming the most cumulative time, not just the slowest single run. Set log_min_duration_statement to capture individual slow statements. Continuous monitoring tools surface regressions automatically as plans change.

SQL query optimization is the practice of rewriting queries and shaping their access paths so the database reads fewer rows, touches fewer pages, and avoids expensive operations like sorts, hash builds, and disk spills. It works at the query level - how you write predicates, joins, subqueries, and pagination - rather than at the server or hardware level. A slow query usually loses time to a sequential scan that an index could have replaced, a join that processes rows in the wrong order, or a predicate the planner cannot match to an index. The techniques below apply across PostgreSQL, MySQL, and other relational engines, with PostgreSQL syntax in the examples.

This page is the query-level companion to PostgreSQL Performance Tuning, which covers server configuration, memory, and hardware. Slow queries cut across every workload - OLTP checkout paths, analytical dashboards, reporting jobs - so the same techniques pay off regardless of domain.

Read the Execution Plan First

Every optimization decision starts with the execution plan. In PostgreSQL, EXPLAIN shows the planner's chosen access paths and cost estimates; EXPLAIN ANALYZE runs the query and reports actual row counts and timing per node, so you can compare estimated rows against real rows. A large gap between the two - say, an estimate of 10 rows against an actual 4 million - points to stale statistics or a predicate the planner cannot reason about, and that gap is what drives a wrong plan.

-- Run the query and show real timing, buffer reads, and row counts
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT id, total FROM orders WHERE customer_id = 42 AND status = 'open';

Read the plan bottom-up: leaf nodes are scans, parent nodes are joins, sorts, and aggregates. Watch for Seq Scan on a large table when a filter should be selective, Rows Removed by Filter numbers in the millions (rows read then discarded), and Sort Method: external merge Disk (the sort spilled because it exceeded work_mem). The cost figure is a unitless estimate for comparing plans, not a time. For column-by-column reading of plans see PostgreSQL EXPLAIN and Execution Plans, and Reading MySQL EXPLAIN Output for MySQL.

Indexing for Queries

An index helps only when the planner can match it to your predicate. The most common reason a query ignores an existing index is a non-sargable predicate - one that wraps the indexed column in a function or expression, which forces the engine to compute the value for every row. Rewrite the predicate so the bare column appears on one side, or build an index on the expression itself.

-- Non-sargable: the function call hides the column from the index
SELECT * FROM events WHERE date_trunc('day', created_at) = '2026-06-01';

-- Sargable rewrite: a range the index on created_at can use directly
SELECT * FROM events
WHERE created_at >= '2026-06-01' AND created_at < '2026-06-02';

-- Or, when you must keep the expression, index it
CREATE INDEX idx_events_day ON events (date_trunc('day', created_at));

Beyond plain B-tree indexes, three variants target query shapes directly. A covering index includes every column the query reads (via INCLUDE in PostgreSQL), so the engine answers from the index alone without visiting the table - an index-only scan. A partial index indexes only the rows matching a WHERE clause, which keeps it small and fast when queries always filter on the same condition, such as WHERE status = 'open'. An expression index stores the result of a function so the matching predicate becomes sargable. See Creating Indexes in PostgreSQL for syntax, and PostgreSQL Index Not Used when a built index is ignored.

Joins and Subqueries

Join order, join type, and predicate placement decide how many rows flow through a query. Filter early: apply selective WHERE conditions before the join so each side carries fewer rows into the join operator. Use INNER JOIN when you only need matching rows, since outer joins must preserve unmatched rows and can block the planner from dropping a table. Match data types on join keys - joining bigint to text, or int to numeric, forces an implicit cast that disables index use on that column. Always state an explicit join predicate; a missing ON condition produces an accidental cross join whose result set is the product of both tables. The choice between nested loop, hash, and merge joins is the planner's, driven by row estimates and work_mem; see PostgreSQL Join Strategies.

Subqueries split into correlated and non-correlated. A correlated subquery references the outer row and can re-execute once per outer row, which is the classic cause of an O(n^2) query. Rewriting an IN or correlated subquery to EXISTS lets the engine stop at the first match and short-circuit, and rewriting to a JOIN lets the planner reorder and use a hash or merge strategy.

-- Correlated IN subquery - the inner query is logically evaluated per outer row
SELECT c.id, c.name FROM customers c
WHERE c.id IN (SELECT o.customer_id FROM orders o WHERE o.total > 1000);

-- EXISTS rewrite - stops at the first matching order, no full materialization
SELECT c.id, c.name FROM customers c
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000
);

Modern planners flatten many subqueries and push predicates down automatically, but they cannot always do so through DISTINCT, aggregates, or LIMIT. When the plan shows a subquery re-scanning, rewrite it by hand.

Pagination, Projection, and Finding Slow Queries

Avoid SELECT *. Returning unused columns inflates I/O, defeats index-only scans, and forces wide rows across the network. List only the columns the application reads.

For pagination, OFFSET scans and discards every row before the offset, so page 10,000 reads 200,000 rows to return 20 - cost grows linearly with depth. Keyset pagination (the seek method) instead filters on the last value seen, so each page reads only the rows it returns when an index covers the sort columns.

-- OFFSET: scans and throws away the first 100000 rows
SELECT id, created_at FROM events ORDER BY created_at, id LIMIT 20 OFFSET 100000;

-- Keyset: seeks straight to the next page using the last row's sort key
SELECT id, created_at FROM events
WHERE (created_at, id) > ('2026-06-01 12:00:00', 84213)
ORDER BY created_at, id
LIMIT 20;

Normalization removes redundant data and keeps writes consistent, but deep normalization can force many joins on read-heavy paths. Selective denormalization - a materialized aggregate, a cached lookup column - trades write complexity and storage for fewer joins; apply it only where a join is a measured bottleneck. For very large tables, partitioning lets the planner skip whole partitions through partition pruning when the query filters on the partition key; this is a table-design technique covered in the performance tuning hub rather than a query rewrite.

To find which queries to optimize, enable log_min_duration_statement to log statements over a threshold, and use the pg_stat_statements extension (added to shared_preload_libraries) to rank queries by total and mean execution time. Tools like pgMustard annotate a plan with concrete suggestions, and PoWA correlates pg_stat_statements data with index advisors over time. Pulse goes a step further for teams running this at scale: it watches plan changes and query latency continuously, flags when a query regresses or starts ignoring an index, and points to the root cause - the stale statistic or non-sargable predicate behind the regression - rather than leaving you to re-run EXPLAIN by hand.

Frequently Asked Questions

Q: How do I tell why a query is slow before changing anything?
A: Run EXPLAIN (ANALYZE, BUFFERS) on the query and read the plan bottom-up. Look for sequential scans on large tables, a wide gap between estimated and actual rows (stale statistics), high Rows Removed by Filter, and sorts or hashes spilling to disk. These signals tell you whether the fix is an index, a predicate rewrite, or refreshed statistics.

Q: What makes a predicate sargable and why does it matter?
A: A sargable predicate compares a bare indexed column to a value, such as created_at >= '2026-06-01', so the engine can seek the index directly. Wrapping the column in a function, like date_trunc('day', created_at) = ..., makes it non-sargable: the engine must compute the expression for every row and the index goes unused. Rewrite to a range, or build an expression index.

Q: When should I rewrite a subquery as a JOIN or EXISTS?
A: Rewrite when the execution plan shows a correlated subquery re-scanning per outer row. EXISTS short-circuits at the first match and suits existence checks; converting to a JOIN lets the planner choose hash or merge strategies and reorder tables. Many planners flatten subqueries automatically, so always confirm against the plan before and after.

Q: Why is keyset pagination faster than OFFSET?
A: OFFSET n reads and discards all n preceding rows on every request, so cost grows with page depth. Keyset pagination filters on the last row's sort key, like WHERE (created_at, id) > (...), so it seeks directly to the next page and reads only the rows returned, given an index on the sort columns.

Q: Does avoiding SELECT * actually improve performance?
A: Yes, in two ways. Selecting only needed columns reduces I/O and network transfer, and it allows index-only scans when a covering index holds every selected column. SELECT * pulls wide rows from the table heap and prevents that optimization.

Q: What is the difference between a covering index and a partial index?
A: A covering index includes all columns a query reads so the engine answers from the index alone (index-only scan), using INCLUDE in PostgreSQL. A partial index indexes only rows matching a WHERE condition, keeping it small when queries always filter the same way. They solve different problems and can be combined.

Q: How do I find which queries to optimize across a busy database?
A: Enable pg_stat_statements and sort by total execution time to find the queries consuming the most cumulative time, not just the slowest single run. Set log_min_duration_statement to capture individual slow statements. Continuous monitoring tools surface regressions automatically as plans change.

PostgreSQL EXPLAIN and Execution Plans: how to read each node and cost in a Postgres plan.
PostgreSQL Performance Tuning: server-level configuration, memory, and scaling - the hub this page complements.
PostgreSQL Join Strategies: nested loop, hash, and merge joins and when the planner picks each.
PostgreSQL Index Not Used: diagnosing why an existing index is ignored.
Creating Indexes in PostgreSQL: B-tree, partial, covering, and expression index syntax.
Reading MySQL EXPLAIN Output: the equivalent plan-reading guide for MySQL.