diff-from, weekly full plus hourly incrementals on S3 or GCS." category: clickhouse tech: ClickHouse questions: - "How do I take incremental backups of ClickHouse?" - "What does --diff-from do in clickhouse-backup?" - "How do I restore an incremental ClickHouse backup?" - "How often should I take full vs incremental backups?" - "How do I retain incremental backup chains?"
ClickHouse data parts are immutable, which makes incremental backups efficient: any part already uploaded in a previous backup can be hardlinked from the new one instead of re-uploaded. The clickhouse-backup CLI implements this through the --diff-from and --diff-from-remote flags. The result is a weekly full plus hourly or daily increments, with storage and bandwidth roughly proportional to the volume of new parts. This guide covers the commands, a recommended schedule, retention, and how restore works against a chain.
How differential backups work
When you run clickhouse-backup create, the tool freezes every part of every selected table into shadow/. If you then upload --diff-from <previous>, the uploader compares part names against the previous backup's manifest and skips parts that already exist remotely, writing only a small reference for them in the new backup. Restoration of the new backup transparently pulls the referenced parts from the base.
The chain is therefore: one full backup, plus N increments that each depend on the previous backup. Deleting the full breaks all increments that reference it.
Installation
If you have not installed the tool yet:
# Debian/Ubuntu
wget https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup_<version>_amd64.deb
sudo dpkg -i clickhouse-backup_<version>_amd64.deb
# RHEL/CentOS/Fedora
sudo yum install https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup-<version>-1.x86_64.rpm
Pick the version from the project's GitHub releases page.
Configure /etc/clickhouse-backup/config.yml with your remote storage backend (S3, GCS, Azure). See the main clickhouse-backup guide for the config schema.
Creating an incremental backup
A typical incremental cycle is two steps: a full backup at the start of the cycle, then differentials against it.
# Sunday 00:00, weekly full
backup_name="full_$(date +%Y_%m_%d)"
clickhouse-backup create "${backup_name}"
clickhouse-backup upload "${backup_name}"
Then, every hour for the rest of the week:
prev_backup_name="full_2026_05_24"
backup_name="inc_$(date +%Y_%m_%d_%H)"
clickhouse-backup create "${backup_name}"
clickhouse-backup upload --diff-from "${prev_backup_name}" "${backup_name}"
To compare against a backup that exists only on remote storage (for example, after local cleanup), use --diff-from-remote:
clickhouse-backup upload --diff-from-remote "${prev_backup_name}" "${backup_name}"
You can also do it in one shot with create_remote:
clickhouse-backup create_remote --diff-from-remote "${prev_backup_name}" "${backup_name}"
Recommended schedule
A common pattern for hourly RPO with low storage cost:
| Day | Time | Action |
|---|---|---|
| Sunday | 00:00 | create_remote full_<week> |
| Sun-Sat | 01:00-23:00 hourly | create_remote --diff-from-remote full_<week> inc_<ts> |
| Sunday next | 00:00 | new full, start next chain |
Cron:
0 0 * * 0 /usr/local/bin/ch-full.sh
0 1-23 * * 0,2-6 /usr/local/bin/ch-inc.sh
0 * * * 1 /usr/local/bin/ch-inc.sh
Where ch-inc.sh resolves the latest weekly full and runs upload --diff-from-remote.
Retention
Two environment variables control automatic cleanup after every upload:
KEEP_BACKUPS_LOCAL=1 KEEP_BACKUPS_REMOTE=200 \
clickhouse-backup upload --diff-from "${prev_backup_name}" "${backup_name}"
Equivalent settings live under general: in the config file:
general:
backups_to_keep_local: 1
backups_to_keep_remote: 200
Set backups_to_keep_remote to at least one full plus every increment that depends on it, otherwise the tool may delete a base that increments still reference. A safe rule: 2 * (increments_per_week + 1) so you always have one complete previous chain.
To rotate manually:
clickhouse-backup delete remote inc_2026_05_20_03
clickhouse-backup delete local inc_2026_05_20_03
Restoring from an incremental chain
You restore the increment, not the full. The tool resolves the dependency chain and pulls the referenced parts automatically.
clickhouse-backup download inc_2026_05_27_14
clickhouse-backup restore --rm inc_2026_05_27_14
Or in one step:
clickhouse-backup restore_remote --rm inc_2026_05_27_14
The --rm flag drops existing tables before restoring, which is what you want for a true disaster recovery. Omit it if you are restoring into an empty instance and want to fail loud on conflicts.
To restore only schema or only data:
clickhouse-backup restore --schema inc_2026_05_27_14
clickhouse-backup restore --data inc_2026_05_27_14
Common Pitfalls
- Deleting the base full. Every increment in the chain references it. If you delete the base, the increments are unrestorable. Use retention values that always cover the full plus its dependents.
- Mixing
--diff-fromand--diff-from-remote.--diff-fromrequires the base to exist locally. After cleanup, switch to--diff-from-remote. - Drifting clocks between hosts. Differential resolution uses backup names. If your scheduler picks a
prev_backup_namethat does not match any existing backup, the upload falls back to a full upload silently and your storage cost spikes. - TTL-driven part churn. Tables with aggressive TTL or frequent
OPTIMIZE FINALrewrite parts often. Each rewrite forces a full re-upload of those parts in the next increment. Schedule fulls more frequently for those tables. - No restore test. A chain that has never been restored is not a backup. Restore the latest increment to a staging node weekly.
Frequently Asked Questions
Q: What is the difference between incremental and differential here?
A: clickhouse-backup uses the terms interchangeably. Each --diff-from <X> backup contains only parts that did not exist in X. There is no separate "differential vs incremental" distinction like in some traditional databases.
Q: How much storage does an incremental backup actually use?
A: Roughly the size of new parts written since the base, plus a small manifest. For append-only ingestion at 10 GB/hour, an hourly increment is around 10 GB. For workloads dominated by OPTIMIZE FINAL or large TTL deletes, increments can be much larger.
Q: Can I have a chain of increments referencing increments?
A: Yes. You can run upload --diff-from inc_prev where inc_prev is itself a diff. This shortens each increment further but lengthens the chain you depend on. Most teams use a flat model: every increment diffs against the latest full.
Q: Does --diff-from also reduce backup creation time on the source?
A: Marginally. The freeze step still runs against all parts. The main savings are upload bandwidth and remote storage.
Q: How do I restore to a specific point in time?
A: Find the increment closest to the desired time with clickhouse-backup list remote, then restore_remote inc_<ts>. ClickHouse has no transaction log replay, so resolution is limited to the cadence of your increments.