clickhouse-backup is an open-source CLI for taking consistent, file-level backups of ClickHouse and pushing them to object storage. It uses ClickHouse's ALTER TABLE ... FREEZE mechanism to create hardlink snapshots of parts in the shadow/ directory, then packages and uploads them to S3, GCS, Azure Blob, or other remote backends. Compared to running BACKUP/RESTORE SQL by hand, the tool adds incremental uploads, retention, a REST API, and structured config. This guide covers installation, configuration, common operations, and remote storage.
Installing clickhouse-backup
Pick a release from the project's GitHub releases page and install either the static binary or a distribution package. Replace <version> with the version you want, for example 2.5.20.
# Static binary
wget https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup.tar.gz
tar zxf clickhouse-backup.tar.gz
# Debian/Ubuntu
wget https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup_<version>_amd64.deb
sudo dpkg -i clickhouse-backup_<version>_amd64.deb
# RHEL/CentOS/Fedora
sudo yum install https://github.com/<org>/clickhouse-backup/releases/download/v<version>/clickhouse-backup-<version>-1.x86_64.rpm
After install, generate a config template:
clickhouse-backup default-config > /etc/clickhouse-backup/config.yml
The binary must run on the same host as ClickHouse, because it needs local filesystem access to /var/lib/clickhouse/.
Configuration
Edit /etc/clickhouse-backup/config.yml. A minimal S3 configuration looks like this:
general:
remote_storage: s3
backups_to_keep_local: 3
backups_to_keep_remote: 7
upload_concurrency: 4
download_concurrency: 4
log_level: info
clickhouse:
host: localhost
port: 9000
username: default
password: ""
timeout: 5m
skip_tables:
- system.*
- INFORMATION_SCHEMA.*
- information_schema.*
s3:
access_key: "AKIA..."
secret_key: "..."
bucket: "ch-backups"
region: us-east-1
endpoint: ""
acl: private
compression_format: tar
compression_level: 1
part_size: 0
For non-AWS S3 compatible storage (Backblaze B2, MinIO, R2), set endpoint and, in some cases, acl: "" to disable ACL headers that those providers reject.
Google Cloud Storage
general:
remote_storage: gcs
gcs:
credentials_json: "/etc/clickhouse-backup/gcs-sa.json"
bucket: "ch-backups"
path: "prod/{cluster}/{shard}"
storage_class: STANDARD
compression_format: tar
Azure Blob Storage
general:
remote_storage: azblob
azblob:
account_name: "myaccount"
account_key: "..."
container: "ch-backups"
endpoint_suffix: "core.windows.net"
Sensitive values can also be passed through environment variables, for example S3_ACCESS_KEY, S3_SECRET_KEY, CLICKHOUSE_PASSWORD, REMOTE_STORAGE. This is preferred for systemd units and Kubernetes deployments.
Core CLI commands
| Command | Purpose |
|---|---|
create |
Freeze tables and create a local backup under /var/lib/clickhouse/backup/<name> |
upload |
Push a local backup to remote storage |
create_remote |
Combined create + upload in one step |
download |
Pull a remote backup to local disk |
restore |
Recreate schema and attach parts from a local backup |
restore_remote |
Combined download + restore |
list |
Show available local and remote backups |
delete |
Remove a backup (local or remote scope) |
tables |
Print databases and tables visible to the tool |
watch |
Run a recurring full + incremental backup loop |
server |
Run the REST API on :7171 |
clean |
Remove leftover shadow/ folders |
Create and upload
sudo clickhouse-backup create bkp_$(date +%Y%m%d_%H%M%S)
sudo clickhouse-backup list local
sudo clickhouse-backup upload bkp_20260527_0100
To back up only specific tables, pass --tables:
sudo clickhouse-backup create --tables='analytics.*,billing.invoices' bkp_partial
create_remote is the typical production call because it avoids leaving large local copies behind:
sudo clickhouse-backup create_remote bkp_$(date +%Y%m%d_%H%M%S)
Restore
sudo clickhouse-backup download bkp_20260527_0100
sudo clickhouse-backup restore bkp_20260527_0100
Use --schema to restore only DDL, --data to restore only parts onto an existing schema, and --rm to drop existing tables first. For a one-shot remote restore:
sudo clickhouse-backup restore_remote --rm bkp_20260527_0100
Incremental backups
clickhouse-backup supports differential uploads by hardlinking unchanged parts to the previous remote backup instead of re-uploading them.
# Weekly full
sudo clickhouse-backup create_remote full_2026_w21
# Hourly differential against the most recent full
sudo clickhouse-backup create bkp_$(date +%H)
sudo clickhouse-backup upload --diff-from full_2026_w21 bkp_$(date +%H)
This dramatically reduces transfer cost on append-only workloads. See the dedicated guide on incremental backups for retention patterns and restoration.
Scheduling
The simplest scheduler is cron. A typical pattern: full backup weekly, differentials hourly.
0 2 * * 0 clickhouse-backup create_remote full_$(date +\%Y_w\%V)
0 * * * 1-6 clickhouse-backup create_remote --diff-from-remote full_$(date -d 'last sunday' +\%Y_w\%V) inc_$(date +\%Y\%m\%d_\%H)
Alternatively, use clickhouse-backup watch which runs a built-in loop, or the server subcommand to expose a REST API for an orchestrator (Argo, Airflow, Kubernetes CronJob) to trigger backups on a schedule.
Verifying backups
A backup that has never been restored is not a backup. At minimum:
- Run
clickhouse-backup list remoteweekly and check sizes are stable. - Restore the latest backup to a staging node on a recurring schedule.
- Compare
count()andsum(...)on a few tables against production.
Common Pitfalls
- Backing up to a disk that holds ClickHouse data. A failed disk wipes both. Always upload to remote storage.
- Forgetting
skip_tables. Leavingsystem.*in scope inflates backups and can fail onsystem.text_logrotation. - Permissions. The tool needs read access to
/var/lib/clickhouseand write access tobackup/andshadow/. Run as theclickhouseuser or viasudo. - Replicated tables.
clickhouse-backuponly backs up the local replica. Run it on one replica per shard, or coordinate via the API so you do not duplicate work. use_embedded_backup_restore: true. This switches to ClickHouse's nativeBACKUP/RESTOREengine and changes on-disk layout. Pick one mode and stick with it.ATTACH PARTerrors after restore. Usually caused by missingforce_restore_dataflag or by restoring into a node that already has a different schema for the table.
Frequently Asked Questions
Q: Does clickhouse-backup stop the database?
A: No. It uses ALTER TABLE ... FREEZE, which creates hardlinks without blocking writes. Reads and inserts continue while the backup runs.
Q: Is the backup consistent across tables?
A: Within a single create call, all tables are frozen sequentially. There is no cross-table transaction in ClickHouse, so point-in-time consistency across tables is approximate, on the order of seconds.
Q: Where are local backups stored?
A: Under /var/lib/clickhouse/backup/<backup_name>/. The directory contains metadata/ with DDL and shadow/ with hardlinked parts. Disk usage is small until parts diverge from the live data.
Q: Can I restore to a different ClickHouse version? A: Restoring to the same major version is supported. Restoring across major versions usually works for the data parts but may require schema adjustments. Test in staging first.
Q: Can I back up only one database?
A: Yes. Use --tables='mydb.*' on create, or --tables='mydb.t1,mydb.t2' for specific tables. The same flag works on restore.
Q: How do I back up a ClickHouse cluster?
A: Run clickhouse-backup on one replica per shard. Use a shared prefix in the remote path that includes shard and replica identifiers, for example path: "prod/{shard}", so restores can target the right node.