PostgreSQL VACUUM Explained: Dead Tuples, Autovacuum, and Transaction Wraparound

Q: What is the difference between VACUUM and VACUUM FULL in PostgreSQL?

Plain VACUUM marks dead tuples reusable in place, runs online without blocking reads or writes, and keeps the table at its current size. VACUUM FULL rewrites the whole table into a new file, returns freed space to the operating system, and rebuilds indexes, but it holds an ACCESS EXCLUSIVE lock for the entire run.

Q: Why does PostgreSQL leave dead tuples instead of deleting rows immediately?

PostgreSQL uses MVCC, so an UPDATE or DELETE keeps the old row version visible to transactions that started earlier. The old version becomes a dead tuple once no running transaction can see it, and VACUUM reclaims that space later rather than at commit time.

Q: How do I tune autovacuum for a large, write-heavy table?

The default autovacuum_vacuum_scale_factor of 0.2 waits for 20% of the table to churn, which is too much on big tables. Set a per-table override such as ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 10000); so vacuum triggers on a fixed row count instead.

Q: How do I check how close a database is to transaction wraparound?

Run SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC; . Compare the age against autovacuum_freeze_max_age (200 million default); a value approaching that limit means anti-wraparound autovacuum is falling behind and needs attention.

Q: Does VACUUM block queries while it runs?

Plain VACUUM takes only a SHARE UPDATE EXCLUSIVE lock, so it runs concurrently with reads and writes. VACUUM FULL is the exception - it takes an ACCESS EXCLUSIVE lock and blocks all access to the table until it completes.

VACUUM is the PostgreSQL maintenance command that removes dead tuples - row versions left behind by UPDATE and DELETE under MVCC - and makes their space reusable. It also refreshes the query planner's statistics, updates the visibility map for index-only scans, and advances the frozen transaction ID horizon to prevent transaction ID wraparound. Without it, tables bloat, query plans drift, and the database can eventually refuse writes.

How VACUUM Works

PostgreSQL stores table data in 8KB pages. Each row version, or tuple, carries a header with two transaction IDs: t_xmin (the transaction that created it) and t_xmax (the transaction that deleted or superseded it). Under Multi-Version Concurrency Control (MVCC), an UPDATE does not overwrite a row in place. It marks the old version with a t_xmax and writes a new version, so concurrent transactions keep seeing the snapshot they began with. The old version becomes a dead tuple once no running transaction can still see it.

Dead tuples are not freed at commit time. Standard VACUUM scans the table, identifies tuples that are invisible to every transaction, and marks their line pointers as reusable so future inserts and updates can refill the slots. It does not return space to the operating system in the general case; instead it stabilizes disk usage at the table's working-set size. A VACUUM runs without an exclusive lock, so reads and writes continue against the table while it executes.

The same pass does three other jobs. It updates the visibility map, a bitmap marking pages where all tuples are visible to everyone, which lets the planner use index-only scans and lets later vacuums skip clean pages. It can trigger ANALYZE to refresh column statistics in pg_statistic. And it freezes old tuples to hold off transaction ID wraparound. You can run VACUUM (VERBOSE, ANALYZE) my_table; to vacuum and re-analyze one table while watching what it touches.

Autovacuum Configuration

The autovacuum daemon runs these passes automatically. It is on by default and launches up to autovacuum_max_workers (default 3) background workers, waking every autovacuum_naptime (default 1 minute) to check which tables crossed their thresholds. A table is vacuumed when its dead-tuple count exceeds autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples. With the defaults that means roughly 20% of the table plus 50 rows. The verified current defaults:

Parameter	Default	Effect
`autovacuum`	`on`	Enables the background daemon
`autovacuum_vacuum_threshold`	`50`	Base dead tuples before a vacuum
`autovacuum_vacuum_scale_factor`	`0.2`	Fraction of table added to the threshold
`autovacuum_analyze_threshold`	`50`	Base changed rows before an analyze
`autovacuum_analyze_scale_factor`	`0.1`	Fraction of table added to the analyze threshold
`autovacuum_vacuum_insert_threshold`	`1000`	Inserted rows before an insert-only vacuum (PG13+)
`autovacuum_vacuum_insert_scale_factor`	`0.2`	Fraction of table for the insert trigger (PG13+)
`autovacuum_vacuum_cost_delay`	`2ms`	Sleep when the cost budget is spent
`autovacuum_vacuum_cost_limit`	`-1`	Inherits `vacuum_cost_limit` (200)
`autovacuum_max_workers`	`3`	Concurrent autovacuum workers
`autovacuum_naptime`	`1min`	Interval between daemon checks

Two defaults are commonly misquoted. autovacuum_vacuum_cost_delay is 2ms in modern PostgreSQL - it was lowered from 20ms in PostgreSQL 12 - which makes autovacuum far more aggressive out of the box. And autovacuum_vacuum_cost_limit is -1, meaning it falls back to vacuum_cost_limit (200) rather than holding its own value. On large, write-heavy tables the percentage-based scale factor scales poorly; set a flat per-table override with ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 10000); so vacuum fires on a fixed row count instead of waiting for 20% churn.

VACUUM vs VACUUM FULL

Plain VACUUM and VACUUM FULL solve different problems and behave very differently. VACUUM marks dead space reusable in place, runs online, and keeps the table at its current physical size. VACUUM FULL rewrites the entire table into a fresh file, packs live tuples densely, returns the freed space to the operating system, and rebuilds indexes - but it takes an ACCESS EXCLUSIVE lock for the whole operation, blocking all reads and writes, and needs free disk space roughly equal to the table's live data while it runs.

Aspect	`VACUUM`	`VACUUM FULL`
Lock	`SHARE UPDATE EXCLUSIVE` (online)	`ACCESS EXCLUSIVE` (blocks everything)
Disk returned to OS	Rarely	Yes, table is compacted
Extra disk needed	None	~size of live data
Rebuilds indexes	No	Yes
When to use	Routine maintenance	After a large one-time bloat event

Reach for VACUUM FULL only when a table has bloated badly - for example after deleting most of its rows - and you need the space back on disk. For online bloat removal without the exclusive lock, the pg_repack extension rewrites a table using a temporary copy and a brief lock at the end. Routine bloat should be handled by a correctly tuned autovacuum, not by scheduled VACUUM FULL.

Transaction ID Wraparound

PostgreSQL transaction IDs are 32-bit, giving about 4 billion values before they wrap. Visibility is decided by comparing a tuple's t_xmin against the current XID on a modular number line: roughly 2 billion XIDs are "in the past" and 2 billion "in the future". If a tuple were left unfrozen long enough for 2 billion new transactions to elapse, it would suddenly appear to be in the future and vanish from view - silent data loss. VACUUM prevents this by freezing old tuples, marking them as committed-and-visible-to-all so their t_xmin no longer needs comparing.

VACUUM freezes tuples older than vacuum_freeze_min_age (default 50 million transactions). Autovacuum forces an anti-wraparound vacuum on any table whose oldest unfrozen XID reaches autovacuum_freeze_max_age (default 200 million), and this aggressive vacuum runs even if autovacuum is otherwise disabled. If freezing still falls behind, PostgreSQL warns, then refuses new write transactions once a table is within roughly 1 million XIDs of wraparound, forcing a single-user VACUUM to recover. Track headroom with SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC; - a value approaching 200 million means autovacuum is not keeping pace.

Monitoring VACUUM and Autovacuum

The signals that matter live in pg_stat_all_tables: n_dead_tup (dead tuples awaiting cleanup), n_live_tup, last_autovacuum, and autovacuum_count. A dead-tuple ratio climbing well past the configured scale factor means autovacuum is being outrun by writes; a last_autovacuum that never advances on a busy table points to a disabled or starved daemon. Pair these with age(relfrozenxid) per table to watch the wraparound horizon, and with pg_stat_progress_vacuum to see a running vacuum's phase and progress.

-- Tables with the most dead tuples and their last autovacuum
SELECT relname,
       n_live_tup,
       n_dead_tup,
       round(n_dead_tup::numeric / nullif(n_live_tup + n_dead_tup, 0), 3) AS dead_ratio,
       last_autovacuum
FROM pg_stat_all_tables
WHERE n_dead_tup > 0
ORDER BY n_dead_tup DESC
LIMIT 20;

Wiring those metrics into alerts by hand is tedious, and a slow wraparound problem is easy to miss until writes stop. Pulse tracks dead-tuple growth, autovacuum cadence, and XID age across your PostgreSQL fleet, and its agentic root-cause analysis flags a table outrunning autovacuum or approaching wraparound before it turns into downtime, with the specific table and tuning change to make.

Frequently Asked Questions

Q: What is the difference between VACUUM and VACUUM FULL in PostgreSQL?
A: Plain VACUUM marks dead tuples reusable in place, runs online without blocking reads or writes, and keeps the table at its current size. VACUUM FULL rewrites the whole table into a new file, returns freed space to the operating system, and rebuilds indexes, but it holds an ACCESS EXCLUSIVE lock for the entire run.

Q: Why does PostgreSQL leave dead tuples instead of deleting rows immediately?
A: PostgreSQL uses MVCC, so an UPDATE or DELETE keeps the old row version visible to transactions that started earlier. The old version becomes a dead tuple once no running transaction can see it, and VACUUM reclaims that space later rather than at commit time.

Q: What is transaction ID wraparound and how does VACUUM prevent it?
A: Transaction IDs are 32-bit and wrap after about 4 billion values, which would make old rows appear to be in the future and disappear. VACUUM freezes tuples older than vacuum_freeze_min_age (50 million by default) so their transaction ID no longer needs comparing, keeping them permanently visible.

Q: How do I tune autovacuum for a large, write-heavy table?
A: The default autovacuum_vacuum_scale_factor of 0.2 waits for 20% of the table to churn, which is too much on big tables. Set a per-table override such as ALTER TABLE t SET (autovacuum_vacuum_scale_factor = 0.01, autovacuum_vacuum_threshold = 10000); so vacuum triggers on a fixed row count instead.

Q: Is autovacuum enough, or do I need scheduled manual VACUUM?
A: A correctly tuned autovacuum handles routine maintenance for most workloads. Run a manual VACUUM ANALYZE after bulk loads or large deletes to refresh statistics and reclaim space promptly, but avoid scheduling VACUUM FULL, which locks the table.

Q: How do I check how close a database is to transaction wraparound?
A: Run SELECT datname, age(datfrozenxid) FROM pg_database ORDER BY 2 DESC;. Compare the age against autovacuum_freeze_max_age (200 million default); a value approaching that limit means anti-wraparound autovacuum is falling behind and needs attention.

Q: Does VACUUM block queries while it runs?
A: Plain VACUUM takes only a SHARE UPDATE EXCLUSIVE lock, so it runs concurrently with reads and writes. VACUUM FULL is the exception - it takes an ACCESS EXCLUSIVE lock and blocks all access to the table until it completes.

PostgreSQL Performance Tuning: server and table settings that interact with vacuum and bloat.
Debugging Low Cache Hit Ratio in PostgreSQL: how bloat from missed vacuums pushes pages out of the buffer cache.
PostgreSQL CREATE INDEX: index types and the maintenance VACUUM performs on them.
PostgreSQL: No Space Left on Device: when table bloat and dead tuples fill the disk.
PostgreSQL Disk Full: diagnosing and recovering from a full data volume.
Diagnosing Slow Queries in PostgreSQL: how stale statistics from a lagging ANALYZE produce bad plans.
PostgreSQL EXPLAIN ANALYZE: reading plans that depend on up-to-date vacuum statistics.