PostgreSQL Performance Tuning: Complete Guide (2025)

PostgreSQL is a powerful database, but optimal performance requires proper configuration and tuning. This comprehensive guide covers essential PostgreSQL performance optimization techniques from basic configuration to advanced tuning strategies.

Configuration Tuning
Query Optimization
Indexing Strategies
Connection Management
Maintenance Operations
Monitoring and Diagnostics

Configuration Tuning

Essential postgresql.conf Settings

Memory Settings

# Memory Configuration (for 16GB RAM server)

# Shared buffers - 25% of total RAM (recommended starting point)
shared_buffers = 4GB

# Effective cache size - 50-75% of total RAM
# This tells PostgreSQL how much memory is available for disk caching
effective_cache_size = 12GB

# Work memory - RAM per operation (sort, hash)
# Formula: (Total RAM - shared_buffers) / max_connections / 3
work_mem = 32MB

# Maintenance work memory - for VACUUM, CREATE INDEX, etc.
# Can be set higher than work_mem
maintenance_work_mem = 512MB

# WAL buffers - usually 16MB is sufficient
wal_buffers = 16MB

Tuning Guidelines:

shared_buffers: Start with 25% of RAM, up to 8-16GB for larger systems
effective_cache_size: Set to 50-75% of total RAM
work_mem: Too high causes memory issues; too low causes disk temp files
maintenance_work_mem: Increase for faster VACUUM and index creation

Checkpoint Settings

# Checkpoint configuration - balance between performance and recovery time

# Maximum time between checkpoints
checkpoint_timeout = 15min

# Maximum size between checkpoints (% of max_wal_size)
checkpoint_completion_target = 0.9

# WAL size limits
max_wal_size = 4GB
min_wal_size = 1GB

# Log checkpoints for monitoring
log_checkpoints = on

Impact:

Larger max_wal_size reduces checkpoint frequency but increases recovery time
Higher checkpoint_completion_target spreads checkpoint I/O over longer period
Monitor checkpoint logs to tune appropriately

Query Planner Settings

# Query planner configuration

# Random page cost (lower for SSDs)
random_page_cost = 1.1  # SSD
# random_page_cost = 4.0  # HDD (default)

# Effective I/O concurrency (for SSDs)
effective_io_concurrency = 200  # SSD
# effective_io_concurrency = 2    # HDD

# Default statistics target (higher = better estimates, slower ANALYZE)
default_statistics_target = 100

# Enable parallel query execution
max_parallel_workers_per_gather = 4
max_parallel_workers = 8

SSD vs HDD:

SSDs benefit from lower random_page_cost (1.1-1.5)
SSDs support higher effective_io_concurrency (100-200)

Connection and Concurrency

# Connection settings

# Maximum number of connections
max_connections = 100

# Increase for better parallelism (set to number of CPU cores)
max_worker_processes = 8

# Parallel maintenance operations
max_parallel_maintenance_workers = 4

Logging for Performance Analysis

# Logging configuration

# Log slow queries
log_min_duration_statement = 1000  # Log queries > 1 second

# Log execution plans for slow queries
auto_explain.log_min_duration = 1000
auto_explain.log_analyze = on

# Connection and checkpoint logging
log_connections = on
log_disconnections = on
log_checkpoints = on

# Lock wait logging
log_lock_waits = on
deadlock_timeout = 1s

Query Optimization

Using EXPLAIN ANALYZE

-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';

-- EXPLAIN ANALYZE - actually runs the query
EXPLAIN ANALYZE
SELECT u.*, o.order_total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01';

-- Detailed output with buffers
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM large_table WHERE status = 'active';

-- Format as JSON for easier parsing
EXPLAIN (ANALYZE, FORMAT JSON)
SELECT * FROM products WHERE price > 100;

Reading EXPLAIN Output

Key Metrics:

Seq Scan: Full table scan (usually bad for large tables)
Index Scan: Using an index (usually good)
Nested Loop: Join method for small datasets
Hash Join: Efficient for large datasets
Merge Join: Efficient for sorted datasets
Actual time: Real execution time
Rows: Estimated vs actual rows

Example Analysis:

Seq Scan on users  (cost=0.00..18334.40 rows=1000 width=100)
                   (actual time=0.023..145.234 rows=982 loops=1)
  Filter: (created_at > '2025-01-01'::date)
  Rows Removed by Filter: 999018
Planning Time: 0.156 ms
Execution Time: 145.678 ms

Problems:

Sequential scan on large table
Many rows removed by filter
Missing index on created_at

Query Optimization Techniques

1. Avoid SELECT *

-- Bad - retrieves unnecessary data
SELECT * FROM users;

-- Good - select only needed columns
SELECT id, email, name FROM users;

2. Use WHERE Clause Effectively

-- Bad - non-sargable query (can't use index efficiently)
SELECT * FROM users WHERE EXTRACT(YEAR FROM created_at) = 2025;

-- Good - sargable query (can use index)
SELECT * FROM users
WHERE created_at >= '2025-01-01'
  AND created_at < '2026-01-01';

3. Optimize JOINs

-- Bad - multiple joins without proper indexes
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE u.email LIKE '%@example.com';

-- Good - use indexes and limit early
SELECT o.id, u.email, p.name
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE u.email_domain = 'example.com'  -- Use indexed column
LIMIT 100;

-- Create supporting indexes
CREATE INDEX idx_users_email_domain ON users(email_domain);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);

4. Use EXISTS Instead of IN for Subqueries

-- Less efficient
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE status = 'completed');

-- More efficient
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.user_id = u.id AND o.status = 'completed'
);

5. Leverage CTEs and Window Functions

-- Using CTE for readability and optimization
WITH recent_orders AS (
    SELECT user_id, COUNT(*) as order_count
    FROM orders
    WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
    HAVING COUNT(*) > 5
)
SELECT u.email, ro.order_count
FROM users u
JOIN recent_orders ro ON u.id = ro.user_id;

-- Window functions instead of self-joins
SELECT
    product_id,
    sale_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY product_id
        ORDER BY sale_date
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) as rolling_7day_total
FROM sales;

6. Batch Operations

-- Bad - multiple single inserts
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');
INSERT INTO logs (message) VALUES ('Log 3');

-- Good - batch insert
INSERT INTO logs (message) VALUES
    ('Log 1'),
    ('Log 2'),
    ('Log 3');

-- Even better - use COPY for bulk loads
COPY logs (message) FROM '/path/to/logs.csv' CSV;

Indexing Strategies

When to Create Indexes

Create indexes for:

Columns in WHERE clauses
JOIN columns
ORDER BY columns
Columns used in GROUP BY
Foreign key columns
Columns with high selectivity

Avoid indexes when:

Table is very small (< 1000 rows)
Column has low cardinality (few distinct values)
Table has frequent writes and few reads
Column is rarely queried

Index Types and Use Cases

B-tree Index (Default)

-- Standard index for equality and range queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Can use for queries on (user_id) or (user_id, status)
-- Cannot efficiently use for queries only on (status)

Partial Index

-- Index only active users
CREATE INDEX idx_active_users_email ON users(email)
WHERE status = 'active';

-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(user_id, created_at)
WHERE created_at > '2025-01-01';

Expression Index

-- Index on expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Now this query can use the index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Index on JSON field
CREATE INDEX idx_metadata_tags ON products((metadata->>'tags'));

GIN Index (for Arrays, JSONB, Full-text)

-- Index for array containment
CREATE INDEX idx_tags ON articles USING GIN(tags);

-- Query using array operators
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];

-- Index for JSONB
CREATE INDEX idx_metadata ON products USING GIN(metadata);

-- Query JSON data
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';

-- Full-text search
CREATE INDEX idx_content_fts ON documents USING GIN(to_tsvector('english', content));

SELECT * FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');

GiST Index (for Geometric Data, Full-text)

-- For range types
CREATE INDEX idx_availability ON hotel_rooms USING GIST(available_dates);

-- For geometric data (requires PostGIS)
CREATE INDEX idx_location ON stores USING GIST(location);

BRIN Index (for Large, Sequentially Ordered Tables)

-- Efficient for time-series data
CREATE INDEX idx_logs_timestamp ON logs USING BRIN(timestamp);

-- Very small index size, good for append-only tables
-- Works well when data is naturally ordered

Index Maintenance

-- Rebuild index
REINDEX INDEX idx_users_email;

-- Rebuild all indexes on table
REINDEX TABLE users;

-- Rebuild concurrently (doesn't lock table)
CREATE INDEX CONCURRENTLY idx_users_email_new ON users(email);
DROP INDEX CONCURRENTLY idx_users_email;
ALTER INDEX idx_users_email_new RENAME TO idx_users_email;

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_toast_%'
ORDER BY pg_relation_size(indexrelid) DESC;

-- Find missing indexes (tables with many seq scans)
SELECT
    schemaname,
    tablename,
    seq_scan,
    seq_tup_read,
    idx_scan,
    seq_tup_read / NULLIF(seq_scan, 0) as avg_tup_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_tup_read DESC;

Connection Management

Connection Pooling

PostgreSQL creates a new process for each connection, which is expensive. Use connection pooling:

PgBouncer Configuration

[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

# Pool mode
pool_mode = transaction

# Connection limits
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5

# Server settings
server_idle_timeout = 600
server_lifetime = 3600

Pool Modes:

session: One server connection per client (default PostgreSQL behavior)
transaction: Most efficient, server released after transaction
statement: Server released after each statement

Application-Level Pooling

# Python with psycopg2
from psycopg2 import pool

connection_pool = pool.SimpleConnectionPool(
    minconn=1,
    maxconn=20,
    host='localhost',
    database='mydb',
    user='myuser',
    password='mypass'
)

# Get connection from pool
conn = connection_pool.getconn()
try:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users")
    results = cursor.fetchall()
finally:
    connection_pool.putconn(conn)

Monitor Connections

-- Current connections
SELECT
    count(*),
    state,
    usename,
    application_name
FROM pg_stat_activity
GROUP BY state, usename, application_name
ORDER BY count DESC;

-- Long-running queries
SELECT
    pid,
    now() - query_start as duration,
    usename,
    state,
    query
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;

-- Kill long-running query
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = 12345;

Maintenance Operations

VACUUM

-- Manual vacuum
VACUUM users;

-- Vacuum with analyze
VACUUM ANALYZE users;

-- Verbose output
VACUUM VERBOSE users;

-- Full vacuum (locks table, reclaims more space)
VACUUM FULL users;

-- Autovacuum settings (postgresql.conf)
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1

ANALYZE

-- Update statistics for query planner
ANALYZE users;

-- Analyze entire database
ANALYZE;

-- Check when table was last analyzed
SELECT
    schemaname,
    tablename,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables;

REINDEX

-- Rebuild index
REINDEX INDEX idx_users_email;

-- Rebuild all indexes on table
REINDEX TABLE users;

-- Rebuild all indexes in database (requires exclusive lock)
REINDEX DATABASE mydb;

-- Rebuild concurrently (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;

Monitoring and Diagnostics

Key Metrics to Monitor

-- Database size
SELECT
    pg_database.datname,
    pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;

-- Table sizes with indexes
SELECT
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) -
                   pg_relation_size(schemaname||'.'||tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

-- Cache hit ratio (should be > 99%)
SELECT
    sum(heap_blks_read) as heap_read,
    sum(heap_blks_hit) as heap_hit,
    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) * 100 AS cache_hit_ratio
FROM pg_statio_user_tables;

-- Index usage
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Bloat estimation
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_dead_tup,
    n_live_tup,
    ROUND(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_tuple_percent
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

pg_stat_statements Extension

-- Enable extension
CREATE EXTENSION pg_stat_statements;

-- Add to postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all

-- Top 10 slowest queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    max_exec_time,
    stddev_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Most frequently called queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

-- Reset statistics
SELECT pg_stat_statements_reset();

Performance Tuning Checklist

Configure memory settings based on available RAM
Adjust checkpoint settings for your workload
Set appropriate random_page_cost for storage type (SSD/HDD)
Enable query logging for slow queries
Create indexes on frequently queried columns
Remove unused indexes
Set up connection pooling
Configure autovacuum appropriately
Monitor cache hit ratio (target > 99%)
Analyze query plans with EXPLAIN ANALYZE
Install and monitor pg_stat_statements
Regular VACUUM and ANALYZE operations
Monitor table and index bloat
Review and optimize slow queries
Set appropriate work_mem for complex queries
Use appropriate data types (avoid VARCHAR when TEXT is better)
Partition large tables
Archive old data regularly

Get comprehensive PostgreSQL performance monitoring with Pulse. Track slow queries, monitor resource usage, and optimize your database in real-time.