Database Backup Strategies: Methods, Recovery, and Best Practices

Q: What is the difference between a logical and a physical database backup?

A logical backup (such as pg_dump ) exports schema and data as portable statements or an archive, restores slowly because it reloads rows and rebuilds indexes, and works across major versions. A physical backup (such as pg_basebackup ) copies the data files and WAL at the byte level, restores much faster, but is tied to the same major version, OS, and architecture.

Q: How do I recover a PostgreSQL database to a specific point in time?

Point-in-time recovery requires a base backup taken with pg_basebackup plus a continuous archive of WAL segments. Restore the base backup, configure a restore_command to fetch archived WAL, set recovery_target_time to the moment you want, and start PostgreSQL so it replays the log up to that point. RDS and Aurora expose this as restore-to-point-in-time within the retention window.

A database backup strategy defines what you copy, how often, where you store it, and how fast you can restore. The two axes that matter are the recovery point objective (RPO), the maximum data loss you can tolerate measured in time, and the recovery time objective (RTO), how long a restore is allowed to take. Every choice below - logical versus physical, full versus incremental, daily dumps versus continuous archiving - is a trade between those two numbers, storage cost, and operational complexity. An untested backup is not a backup; it is a hope.

Logical vs Physical Backups

A logical backup exports the database as a series of statements or a portable archive - schema definitions plus the data needed to recreate objects. In PostgreSQL, pg_dump produces one database and pg_dumpall adds cluster-wide objects like roles and tablespaces. Logical dumps are portable across major versions and across architectures, let you restore a single table, and compress well. The cost is speed: restoring reloads every row and rebuilds every index, so a multi-terabyte logical restore can run for many hours.

# Logical backup of one database in the custom format, then restore in parallel
pg_dump -Fc -d sales -f sales.dump
pg_restore -d sales -j 4 sales.dump

A physical backup copies the data files and write-ahead log (WAL) at the byte level. pg_basebackup takes a consistent base copy of the entire cluster while it stays online. Physical backups restore far faster than logical ones because there is no reload or index rebuild - you start the server on the copied files. They are tied to the same major version, OS, and architecture, and they back up the whole cluster, not a single table.

# Physical base backup of the whole cluster, compressed, with WAL included
pg_basebackup -D /backups/base -Ft -z -X stream -P

Other engines follow the same split. MySQL has mysqldump (logical) and Percona XtraBackup or its native clone for physical copies; MongoDB has mongodump (logical) and filesystem snapshots for physical.

Full, Incremental, and Differential Backups

A full backup captures everything at a point in time. An incremental backup captures only the blocks changed since the previous backup of any kind. A differential backup captures everything changed since the last full backup. Incrementals minimize backup size and time but require the full plus every increment in order to restore. Differentials need only the full plus the latest differential, trading larger backups for a simpler, faster restore.

PostgreSQL gained native block-level incremental base backups in version 17 via pg_basebackup --incremental, reconstructed with pg_combinebackup. Continuous WAL archiving is the more common incremental mechanism: you keep one base backup and stream every WAL segment, which lets you replay forward to any moment - point-in-time recovery (PITR). Tools like pgBackRest and Barman manage full/incremental/differential base backups plus WAL archiving, parallelism, retention, and verification in one place, and are the standard choice for production PostgreSQL.

Backup type	What it captures	RPO	Restore speed	Best for
Full (logical, `pg_dump`)	Whole DB as portable statements	Last dump (hours)	Slow - reload + reindex	Small DBs, migrations, single-table restore
Full (physical, `pg_basebackup`)	Whole cluster, byte level	Last base backup	Fast - start on files	Fast full-cluster recovery
Differential	Changes since last full	Last differential	Moderate - full + one diff	Balancing size and restore steps
Incremental	Changes since last backup	Last increment	Slower - full + chain	Large DBs, minimizing backup window
Continuous WAL (PITR)	Every committed transaction	Seconds to minutes	Base restore + WAL replay	Lowest RPO, recover to exact time

PostgreSQL Point-in-Time Recovery

PITR is how PostgreSQL achieves an RPO measured in seconds. You take one base backup with pg_basebackup, then continuously ship completed WAL segments to durable storage by setting archive_mode = on and an archive_command (or by using pgBackRest/Barman as the archiver). To recover, you restore the base backup, supply a restore_command that fetches archived WAL, set a recovery_target_time, and let PostgreSQL replay the log up to that instant.

# postgresql.conf - archive each completed WAL segment to durable storage
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'

The same WAL stream that powers PITR also feeds streaming replication, so a standby replica and your backup pipeline reinforce each other. A replica is not a backup, though - a DROP TABLE or corrupted page replicates to the standby in milliseconds. PITR lets you recover to the moment before the mistake; replication does not. If startup later fails with an error such as could not locate a valid checkpoint, the WAL or control file is the problem, which is exactly what a tested archive protects against.

Cloud-managed services automate this layer. Amazon RDS and Aurora take daily snapshots plus continuous WAL capture, giving PITR to any second within the retention window (1-35 days on RDS). You still own retention policy, cross-region copies, and - critically - testing that a restore actually works.

Storage, Retention, and the 3-2-1 Rule

The 3-2-1 rule is the baseline: keep at least 3 copies of the data, on 2 different media or storage types, with 1 copy offsite. For databases this usually means production storage, a backup repository in the same region, and a copy replicated to another region or account. Cross-region and cross-account copies protect against a regional outage or a compromised account deleting both the data and its backups. Encrypt backups at rest and in transit, and treat the encryption keys as part of the recovery plan - a backup you cannot decrypt is lost data.

Retention balances compliance and cost. A common tier is hourly or daily backups kept for weeks for operational recovery, plus monthly or yearly archives kept for years for audit. Cold archival storage such as Amazon S3 Glacier or its Deep Archive class cuts long-term cost by an order of magnitude, at the price of retrieval latency measured in minutes to hours - acceptable for compliance copies, not for your primary restore path.

Connecting your databases to Pulse closes the gap between "backups are configured" and "backups will work." Pulse continuously watches WAL archiving health, replication lag, checkpoint activity, and disk headroom, and flags a stalled archive_command, a backup job that silently stopped, or replication that has fallen behind your RPO - before the day you actually need to restore.

Testing and Restoring Backups

A backup you have never restored has an unknown RTO and an unknown success rate. Schedule periodic full restores into an isolated environment, time them against your RTO, and run integrity checks - pg_verifybackup validates a pg_basebackup archive, and a trial pg_restore or pg_combinebackup confirms the logical or incremental chain reconstructs. Verify the data after restore, not just that the job exited zero; checksums catch bit rot but a logically correct restore can still be missing the last few minutes if WAL archiving lagged.

When recovery is needed, assess the scope first - single table, whole database, or cluster - because that decides logical versus physical and how far back you replay. Restore the most recent valid base backup, replay WAL to the target time for PITR, then validate before reopening to traffic. Record the actual RTO each time and feed it back into the strategy; restores that overrun the RTO mean the backup method, parallelism, or storage tier needs to change.

Frequently Asked Questions

Q: What is the difference between a logical and a physical database backup?
A: A logical backup (such as pg_dump) exports schema and data as portable statements or an archive, restores slowly because it reloads rows and rebuilds indexes, and works across major versions. A physical backup (such as pg_basebackup) copies the data files and WAL at the byte level, restores much faster, but is tied to the same major version, OS, and architecture.

Q: What is the difference between incremental and differential backups?
A: An incremental backup captures only the changes since the previous backup of any type, so restoring requires the full backup plus every increment in sequence. A differential backup captures all changes since the last full backup, so restoring needs only the full plus the most recent differential - larger backups but a simpler, faster restore.

Q: How often should you back up a database?
A: Set the frequency from your RPO. If you can lose at most a few minutes of data, use continuous WAL archiving for point-in-time recovery rather than periodic dumps. If losing a day is acceptable, a nightly full or differential backup may be enough. Most production PostgreSQL setups combine a periodic base backup with continuous WAL archiving.

Q: How do I recover a PostgreSQL database to a specific point in time?
A: Point-in-time recovery requires a base backup taken with pg_basebackup plus a continuous archive of WAL segments. Restore the base backup, configure a restore_command to fetch archived WAL, set recovery_target_time to the moment you want, and start PostgreSQL so it replays the log up to that point. RDS and Aurora expose this as restore-to-point-in-time within the retention window.

Q: Is database replication a substitute for backups?
A: No. A replica protects against hardware or node failure, but logical errors like an accidental DROP TABLE or data corruption replicate to the standby almost instantly. Backups, and point-in-time recovery in particular, let you recover to the moment before the mistake. Run both: replication for availability, backups for recoverability.

Q: What is the 3-2-1 backup rule?
A: The 3-2-1 rule means keeping at least 3 copies of your data, stored on 2 different media or storage types, with 1 copy held offsite. For databases this typically maps to production storage, a same-region backup repository, and a cross-region or cross-account copy that survives a regional outage or a compromised account.

Q: How do I know if my backups will actually work?
A: Restore them on a schedule into an isolated environment and time the restore against your RTO. Use pg_verifybackup to validate a physical backup and a trial pg_restore to validate a logical one, then run data integrity checks on the restored copy. A backup that has never been restored has an unknown recovery time and an unknown success rate.

PostgreSQL Configuration: Tuning Parameters for Performance and Durability: WAL, checkpoint, and archiving settings that govern how backups and recovery behave.
PostgreSQL Performance Tuning: Query- and configuration-level levers that affect backup windows and restore times.
Database Monitoring Best Practices: The signals - WAL archiving health, replication lag, disk headroom - that tell you a backup pipeline is healthy.
How to Fix PostgreSQL Error: PANIC - Could Not Locate a Valid Checkpoint: The corruption scenario that PITR and a tested WAL archive are meant to recover from.