PostgreSQL is a powerful database, but optimal performance requires proper configuration and tuning. This comprehensive guide covers essential PostgreSQL performance optimization techniques from basic configuration to advanced tuning strategies.
Table of Contents
- Configuration Tuning
- Query Optimization
- Indexing Strategies
- Connection Management
- Maintenance Operations
- Monitoring and Diagnostics
Configuration Tuning
Essential postgresql.conf Settings
Memory Settings
# Memory Configuration (for 16GB RAM server)
# Shared buffers - 25% of total RAM (recommended starting point)
shared_buffers = 4GB
# Effective cache size - 50-75% of total RAM
# This tells PostgreSQL how much memory is available for disk caching
effective_cache_size = 12GB
# Work memory - RAM per operation (sort, hash)
# Formula: (Total RAM - shared_buffers) / max_connections / 3
work_mem = 32MB
# Maintenance work memory - for VACUUM, CREATE INDEX, etc.
# Can be set higher than work_mem
maintenance_work_mem = 512MB
# WAL buffers - usually 16MB is sufficient
wal_buffers = 16MB
Tuning Guidelines:
- shared_buffers: Start with 25% of RAM, up to 8-16GB for larger systems
- effective_cache_size: Set to 50-75% of total RAM
- work_mem: Too high causes memory issues; too low causes disk temp files
- maintenance_work_mem: Increase for faster VACUUM and index creation
Checkpoint Settings
# Checkpoint configuration - balance between performance and recovery time
# Maximum time between checkpoints
checkpoint_timeout = 15min
# Maximum size between checkpoints (% of max_wal_size)
checkpoint_completion_target = 0.9
# WAL size limits
max_wal_size = 4GB
min_wal_size = 1GB
# Log checkpoints for monitoring
log_checkpoints = on
Impact:
- Larger
max_wal_sizereduces checkpoint frequency but increases recovery time - Higher
checkpoint_completion_targetspreads checkpoint I/O over longer period - Monitor checkpoint logs to tune appropriately
Query Planner Settings
# Query planner configuration
# Random page cost (lower for SSDs)
random_page_cost = 1.1 # SSD
# random_page_cost = 4.0 # HDD (default)
# Effective I/O concurrency (for SSDs)
effective_io_concurrency = 200 # SSD
# effective_io_concurrency = 2 # HDD
# Default statistics target (higher = better estimates, slower ANALYZE)
default_statistics_target = 100
# Enable parallel query execution
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
SSD vs HDD:
- SSDs benefit from lower
random_page_cost(1.1-1.5) - SSDs support higher
effective_io_concurrency(100-200)
Connection and Concurrency
# Connection settings
# Maximum number of connections
max_connections = 100
# Increase for better parallelism (set to number of CPU cores)
max_worker_processes = 8
# Parallel maintenance operations
max_parallel_maintenance_workers = 4
Logging for Performance Analysis
# Logging configuration
# Log slow queries
log_min_duration_statement = 1000 # Log queries > 1 second
# Log execution plans for slow queries
auto_explain.log_min_duration = 1000
auto_explain.log_analyze = on
# Connection and checkpoint logging
log_connections = on
log_disconnections = on
log_checkpoints = on
# Lock wait logging
log_lock_waits = on
deadlock_timeout = 1s
Query Optimization
Using EXPLAIN ANALYZE
-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- EXPLAIN ANALYZE - actually runs the query
EXPLAIN ANALYZE
SELECT u.*, o.order_total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2025-01-01';
-- Detailed output with buffers
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)
SELECT * FROM large_table WHERE status = 'active';
-- Format as JSON for easier parsing
EXPLAIN (ANALYZE, FORMAT JSON)
SELECT * FROM products WHERE price > 100;
Reading EXPLAIN Output
Key Metrics:
- Seq Scan: Full table scan (usually bad for large tables)
- Index Scan: Using an index (usually good)
- Nested Loop: Join method for small datasets
- Hash Join: Efficient for large datasets
- Merge Join: Efficient for sorted datasets
- Actual time: Real execution time
- Rows: Estimated vs actual rows
Example Analysis:
Seq Scan on users (cost=0.00..18334.40 rows=1000 width=100)
(actual time=0.023..145.234 rows=982 loops=1)
Filter: (created_at > '2025-01-01'::date)
Rows Removed by Filter: 999018
Planning Time: 0.156 ms
Execution Time: 145.678 ms
Problems:
- Sequential scan on large table
- Many rows removed by filter
- Missing index on
created_at
Query Optimization Techniques
1. Avoid SELECT *
-- Bad - retrieves unnecessary data
SELECT * FROM users;
-- Good - select only needed columns
SELECT id, email, name FROM users;
2. Use WHERE Clause Effectively
-- Bad - non-sargable query (can't use index efficiently)
SELECT * FROM users WHERE EXTRACT(YEAR FROM created_at) = 2025;
-- Good - sargable query (can use index)
SELECT * FROM users
WHERE created_at >= '2025-01-01'
AND created_at < '2026-01-01';
3. Optimize JOINs
-- Bad - multiple joins without proper indexes
SELECT *
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE u.email LIKE '%@example.com';
-- Good - use indexes and limit early
SELECT o.id, u.email, p.name
FROM users u
JOIN orders o ON o.user_id = u.id
JOIN products p ON o.product_id = p.id
WHERE u.email_domain = 'example.com' -- Use indexed column
LIMIT 100;
-- Create supporting indexes
CREATE INDEX idx_users_email_domain ON users(email_domain);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_product_id ON orders(product_id);
4. Use EXISTS Instead of IN for Subqueries
-- Less efficient
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE status = 'completed');
-- More efficient
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.status = 'completed'
);
5. Leverage CTEs and Window Functions
-- Using CTE for readability and optimization
WITH recent_orders AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
GROUP BY user_id
HAVING COUNT(*) > 5
)
SELECT u.email, ro.order_count
FROM users u
JOIN recent_orders ro ON u.id = ro.user_id;
-- Window functions instead of self-joins
SELECT
product_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY product_id
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as rolling_7day_total
FROM sales;
6. Batch Operations
-- Bad - multiple single inserts
INSERT INTO logs (message) VALUES ('Log 1');
INSERT INTO logs (message) VALUES ('Log 2');
INSERT INTO logs (message) VALUES ('Log 3');
-- Good - batch insert
INSERT INTO logs (message) VALUES
('Log 1'),
('Log 2'),
('Log 3');
-- Even better - use COPY for bulk loads
COPY logs (message) FROM '/path/to/logs.csv' CSV;
Indexing Strategies
When to Create Indexes
Create indexes for:
- Columns in WHERE clauses
- JOIN columns
- ORDER BY columns
- Columns used in GROUP BY
- Foreign key columns
- Columns with high selectivity
Avoid indexes when:
- Table is very small (< 1000 rows)
- Column has low cardinality (few distinct values)
- Table has frequent writes and few reads
- Column is rarely queried
Index Types and Use Cases
B-tree Index (Default)
-- Standard index for equality and range queries
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Composite index (column order matters!)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- Can use for queries on (user_id) or (user_id, status)
-- Cannot efficiently use for queries only on (status)
Partial Index
-- Index only active users
CREATE INDEX idx_active_users_email ON users(email)
WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(user_id, created_at)
WHERE created_at > '2025-01-01';
Expression Index
-- Index on expression
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now this query can use the index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- Index on JSON field
CREATE INDEX idx_metadata_tags ON products((metadata->>'tags'));
GIN Index (for Arrays, JSONB, Full-text)
-- Index for array containment
CREATE INDEX idx_tags ON articles USING GIN(tags);
-- Query using array operators
SELECT * FROM articles WHERE tags @> ARRAY['postgresql', 'performance'];
-- Index for JSONB
CREATE INDEX idx_metadata ON products USING GIN(metadata);
-- Query JSON data
SELECT * FROM products WHERE metadata @> '{"category": "electronics"}';
-- Full-text search
CREATE INDEX idx_content_fts ON documents USING GIN(to_tsvector('english', content));
SELECT * FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('postgresql & performance');
GiST Index (for Geometric Data, Full-text)
-- For range types
CREATE INDEX idx_availability ON hotel_rooms USING GIST(available_dates);
-- For geometric data (requires PostGIS)
CREATE INDEX idx_location ON stores USING GIST(location);
BRIN Index (for Large, Sequentially Ordered Tables)
-- Efficient for time-series data
CREATE INDEX idx_logs_timestamp ON logs USING BRIN(timestamp);
-- Very small index size, good for append-only tables
-- Works well when data is naturally ordered
Index Maintenance
-- Rebuild index
REINDEX INDEX idx_users_email;
-- Rebuild all indexes on table
REINDEX TABLE users;
-- Rebuild concurrently (doesn't lock table)
CREATE INDEX CONCURRENTLY idx_users_email_new ON users(email);
DROP INDEX CONCURRENTLY idx_users_email;
ALTER INDEX idx_users_email_new RENAME TO idx_users_email;
-- Find unused indexes
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_toast_%'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Find missing indexes (tables with many seq scans)
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / NULLIF(seq_scan, 0) as avg_tup_per_scan
FROM pg_stat_user_tables
WHERE seq_scan > 1000
ORDER BY seq_tup_read DESC;
Connection Management
Connection Pooling
PostgreSQL creates a new process for each connection, which is expensive. Use connection pooling:
PgBouncer Configuration
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
# Pool mode
pool_mode = transaction
# Connection limits
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
# Server settings
server_idle_timeout = 600
server_lifetime = 3600
Pool Modes:
- session: One server connection per client (default PostgreSQL behavior)
- transaction: Most efficient, server released after transaction
- statement: Server released after each statement
Application-Level Pooling
# Python with psycopg2
from psycopg2 import pool
connection_pool = pool.SimpleConnectionPool(
minconn=1,
maxconn=20,
host='localhost',
database='mydb',
user='myuser',
password='mypass'
)
# Get connection from pool
conn = connection_pool.getconn()
try:
cursor = conn.cursor()
cursor.execute("SELECT * FROM users")
results = cursor.fetchall()
finally:
connection_pool.putconn(conn)
Monitor Connections
-- Current connections
SELECT
count(*),
state,
usename,
application_name
FROM pg_stat_activity
GROUP BY state, usename, application_name
ORDER BY count DESC;
-- Long-running queries
SELECT
pid,
now() - query_start as duration,
usename,
state,
query
FROM pg_stat_activity
WHERE state != 'idle'
AND query_start < now() - interval '5 minutes'
ORDER BY duration DESC;
-- Kill long-running query
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE pid = 12345;
Maintenance Operations
VACUUM
-- Manual vacuum
VACUUM users;
-- Vacuum with analyze
VACUUM ANALYZE users;
-- Verbose output
VACUUM VERBOSE users;
-- Full vacuum (locks table, reclaims more space)
VACUUM FULL users;
-- Autovacuum settings (postgresql.conf)
autovacuum = on
autovacuum_max_workers = 3
autovacuum_naptime = 1min
autovacuum_vacuum_threshold = 50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_threshold = 50
autovacuum_analyze_scale_factor = 0.1
ANALYZE
-- Update statistics for query planner
ANALYZE users;
-- Analyze entire database
ANALYZE;
-- Check when table was last analyzed
SELECT
schemaname,
tablename,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze
FROM pg_stat_user_tables;
REINDEX
-- Rebuild index
REINDEX INDEX idx_users_email;
-- Rebuild all indexes on table
REINDEX TABLE users;
-- Rebuild all indexes in database (requires exclusive lock)
REINDEX DATABASE mydb;
-- Rebuild concurrently (PostgreSQL 12+)
REINDEX INDEX CONCURRENTLY idx_users_email;
Monitoring and Diagnostics
Key Metrics to Monitor
-- Database size
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;
-- Table sizes with indexes
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) -
pg_relation_size(schemaname||'.'||tablename)) AS index_size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
-- Cache hit ratio (should be > 99%)
SELECT
sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) * 100 AS cache_hit_ratio
FROM pg_statio_user_tables;
-- Index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Bloat estimation
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup,
n_live_tup,
ROUND(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_tuple_percent
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
pg_stat_statements Extension
-- Enable extension
CREATE EXTENSION pg_stat_statements;
-- Add to postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
-- Top 10 slowest queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
stddev_exec_time
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Most frequently called queries
SELECT
query,
calls,
total_exec_time,
mean_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
-- Reset statistics
SELECT pg_stat_statements_reset();
Performance Tuning Checklist
- Configure memory settings based on available RAM
- Adjust checkpoint settings for your workload
- Set appropriate
random_page_costfor storage type (SSD/HDD) - Enable query logging for slow queries
- Create indexes on frequently queried columns
- Remove unused indexes
- Set up connection pooling
- Configure autovacuum appropriately
- Monitor cache hit ratio (target > 99%)
- Analyze query plans with EXPLAIN ANALYZE
- Install and monitor pg_stat_statements
- Regular VACUUM and ANALYZE operations
- Monitor table and index bloat
- Review and optimize slow queries
- Set appropriate
work_memfor complex queries - Use appropriate data types (avoid VARCHAR when TEXT is better)
- Partition large tables
- Archive old data regularly
Related Resources
Get comprehensive PostgreSQL performance monitoring with Pulse. Track slow queries, monitor resource usage, and optimize your database in real-time.