NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse Incremental Backups with clickhouse-backup

diff-from, weekly full plus hourly incrementals on S3 or GCS." category: clickhouse tech: ClickHouse questions: - "How do I take incremental backups of ClickHouse?" - "What does --diff-from do in clickhouse-backup?" - "How do I restore an incremental ClickHouse backup?" - "How often should I take full vs incremental backups?" - "How do I retain incremental backup chains?"

ClickHouse data parts are immutable, which makes incremental backups efficient: any part already uploaded in a previous backup can be hardlinked from the new one instead of re-uploaded. The clickhouse-backup CLI implements this through the --diff-from and --diff-from-remote flags. The result is a weekly full plus hourly or daily increments, with storage and bandwidth roughly proportional to the volume of new parts. This guide covers the commands, a recommended schedule, retention, and how restore works against a chain.

How differential backups work

When you run clickhouse-backup create, the tool freezes every part of every selected table into shadow/. If you then upload --diff-from <previous>, the uploader compares part names against the previous backup's manifest and skips parts that already exist remotely, writing only a small reference for them in the new backup. Restoration of the new backup transparently pulls the referenced parts from the base.

The chain is therefore: one full backup, plus N increments that each depend on the previous backup. Deleting the full breaks all increments that reference it.

Installation

If you have not installed the tool yet:

# Debian/Ubuntu
wget https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup_<version>_amd64.deb
sudo dpkg -i clickhouse-backup_<version>_amd64.deb

# RHEL/CentOS/Fedora
sudo yum install https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup-<version>-1.x86_64.rpm

Pick the version from the project's GitHub releases page.

Configure /etc/clickhouse-backup/config.yml with your remote storage backend (S3, GCS, Azure). See the main clickhouse-backup guide for the config schema.

Creating an incremental backup

A typical incremental cycle is two steps: a full backup at the start of the cycle, then differentials against it.

# Sunday 00:00, weekly full
backup_name="full_$(date +%Y_%m_%d)"
clickhouse-backup create   "${backup_name}"
clickhouse-backup upload   "${backup_name}"

Then, every hour for the rest of the week:

prev_backup_name="full_2026_05_24"
backup_name="inc_$(date +%Y_%m_%d_%H)"

clickhouse-backup create "${backup_name}"
clickhouse-backup upload --diff-from "${prev_backup_name}" "${backup_name}"

To compare against a backup that exists only on remote storage (for example, after local cleanup), use --diff-from-remote:

clickhouse-backup upload --diff-from-remote "${prev_backup_name}" "${backup_name}"

You can also do it in one shot with create_remote:

clickhouse-backup create_remote --diff-from-remote "${prev_backup_name}" "${backup_name}"

A common pattern for hourly RPO with low storage cost:

Day Time Action
Sunday 00:00 create_remote full_<week>
Sun-Sat 01:00-23:00 hourly create_remote --diff-from-remote full_<week> inc_<ts>
Sunday next 00:00 new full, start next chain

Cron:

0 0 * * 0  /usr/local/bin/ch-full.sh
0 1-23 * * 0,2-6  /usr/local/bin/ch-inc.sh
0 * * * 1  /usr/local/bin/ch-inc.sh

Where ch-inc.sh resolves the latest weekly full and runs upload --diff-from-remote.

Retention

Two environment variables control automatic cleanup after every upload:

KEEP_BACKUPS_LOCAL=1 KEEP_BACKUPS_REMOTE=200 \
  clickhouse-backup upload --diff-from "${prev_backup_name}" "${backup_name}"

Equivalent settings live under general: in the config file:

general:
  backups_to_keep_local: 1
  backups_to_keep_remote: 200

Set backups_to_keep_remote to at least one full plus every increment that depends on it, otherwise the tool may delete a base that increments still reference. A safe rule: 2 * (increments_per_week + 1) so you always have one complete previous chain.

To rotate manually:

clickhouse-backup delete remote inc_2026_05_20_03
clickhouse-backup delete local  inc_2026_05_20_03

Restoring from an incremental chain

You restore the increment, not the full. The tool resolves the dependency chain and pulls the referenced parts automatically.

clickhouse-backup download inc_2026_05_27_14
clickhouse-backup restore  --rm inc_2026_05_27_14

Or in one step:

clickhouse-backup restore_remote --rm inc_2026_05_27_14

The --rm flag drops existing tables before restoring, which is what you want for a true disaster recovery. Omit it if you are restoring into an empty instance and want to fail loud on conflicts.

To restore only schema or only data:

clickhouse-backup restore --schema inc_2026_05_27_14
clickhouse-backup restore --data   inc_2026_05_27_14

Common Pitfalls

  • Deleting the base full. Every increment in the chain references it. If you delete the base, the increments are unrestorable. Use retention values that always cover the full plus its dependents.
  • Mixing --diff-from and --diff-from-remote. --diff-from requires the base to exist locally. After cleanup, switch to --diff-from-remote.
  • Drifting clocks between hosts. Differential resolution uses backup names. If your scheduler picks a prev_backup_name that does not match any existing backup, the upload falls back to a full upload silently and your storage cost spikes.
  • TTL-driven part churn. Tables with aggressive TTL or frequent OPTIMIZE FINAL rewrite parts often. Each rewrite forces a full re-upload of those parts in the next increment. Schedule fulls more frequently for those tables.
  • No restore test. A chain that has never been restored is not a backup. Restore the latest increment to a staging node weekly.

Frequently Asked Questions

Q: What is the difference between incremental and differential here? A: clickhouse-backup uses the terms interchangeably. Each --diff-from <X> backup contains only parts that did not exist in X. There is no separate "differential vs incremental" distinction like in some traditional databases.

Q: How much storage does an incremental backup actually use? A: Roughly the size of new parts written since the base, plus a small manifest. For append-only ingestion at 10 GB/hour, an hourly increment is around 10 GB. For workloads dominated by OPTIMIZE FINAL or large TTL deletes, increments can be much larger.

Q: Can I have a chain of increments referencing increments? A: Yes. You can run upload --diff-from inc_prev where inc_prev is itself a diff. This shortens each increment further but lengthens the chain you depend on. Most teams use a flat model: every increment diffs against the latest full.

Q: Does --diff-from also reduce backup creation time on the source? A: Marginally. The freeze step still runs against all parts. The main savings are upload bandwidth and remote storage.

Q: How do I restore to a specific point in time? A: Find the increment closest to the desired time with clickhouse-backup list remote, then restore_remote inc_<ts>. ClickHouse has no transaction log replay, so resolution is limited to the cadence of your increments.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.