ClickHouse coordinates replicated tables through ZooKeeper or its drop-in replacement, ClickHouse Keeper. This coordination layer stores the state of the distributed system — which parts exist, which merges are queued, replication offsets — not the table data itself. That distinction shapes the entire backup strategy: in most cases you should not try to back up ZooKeeper/Keeper as if it were a database, because any snapshot of it is stale the moment it is taken.
This guide explains what is safe to back up, how Keeper snapshots actually work, and how to recover coordination state from your ClickHouse data backups using SYSTEM RESTORE REPLICA.
Why You Usually Don't Back Up ZooKeeper Itself
ZooKeeper and Keeper hold transient metadata: replica liveness, the replication queue, in-flight merges and mutations, block numbers, and quorum state. This data changes constantly and is tightly bound to what is physically on each ClickHouse node's disk at this instant.
A backup of the coordination state taken even a few seconds ago is already inconsistent with the current data on disk. Restoring it would reintroduce parts that have since been merged away, point to replicas that no longer exist, or replay a stale replication queue — leaving the cluster in a worse state than a clean rebuild.
The recommended approach is:
- Back up your ClickHouse data with a tool like `clickhouse-backup` / native `BACKUP` (and ideally incremental backups).
- Run a 3+ node ZooKeeper/Keeper ensemble so the coordination layer survives the loss of any single node without needing a restore at all.
- Recover coordination state from the data using
SYSTEM RESTORE REPLICAif it is ever lost — see below.
The one scenario where a true Keeper-level snapshot copy is genuinely useful is recovering a Keeper ensemble that has lost quorum, which is covered in the Keeper snapshot section.
Backup Approaches Compared
| Approach | What it captures | Consistent with data? | When to use |
|---|---|---|---|
Back up ClickHouse data + SYSTEM RESTORE REPLICA |
Table parts on disk; coordination is rebuilt | Yes — rebuilt from data | Default strategy for metadata loss |
| Copy Keeper snapshot + log dirs | Raft snapshot of coordination state | Only at copy time; drifts immediately | Restoring a Keeper ensemble that lost quorum |
csnp on-demand snapshot |
Latest committed Keeper state | Same caveat as above | Forcing a fresh snapshot before maintenance |
Back up raw ZK version-2 dir |
ZooKeeper's own snapshots + txn logs | Stale almost immediately | Migration to Keeper, not recovery |
ClickHouse Keeper Snapshots and Files
ClickHouse Keeper persists its Raft state in two directories configured under <keeper_server>:
snapshot_storage_path— point-in-time snapshots of the coordination state.log_storage_path— the Raft change log applied on top of the latest snapshot.
<keeper_server>
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
<coordination_settings>
<snapshot_distance>100000</snapshot_distance>
<snapshots_to_keep>3</snapshots_to_keep>
</coordination_settings>
</keeper_server>
Keeper writes a new snapshot every snapshot_distance committed log records (default 100000) and retains snapshots_to_keep of them (default 3). Older snapshots and logs are purged automatically, which is the Keeper-native equivalent of ZooKeeper's autopurge (see the ZooKeeper configuration guide).
Forcing a snapshot on demand
To capture a fresh snapshot before risky maintenance, send the csnp four-letter command to a Keeper node. It schedules a snapshot and returns the committed log index it will contain:
echo csnp | nc localhost 9181
You can confirm the snapshot landed with lgif, which reports last_snapshot_idx (the largest committed index in the last snapshot) alongside last_log_idx:
echo lgif | nc localhost 9181
Four-letter commands must be allow-listed via
<four_letter_word_allow_list>in the Keeper config.csnpandlgifare included in the default allow list on current versions.
Copying the snapshot directory
For an ensemble-level safety copy, stop a single Keeper node (or rely on create_snapshot_on_exit to flush a final snapshot at shutdown) and copy both directories:
# On one Keeper node, after a clean stop or a fresh csnp
tar czf keeper-state-$(date +%F).tar.gz \
/var/lib/clickhouse/coordination/snapshots \
/var/lib/clickhouse/coordination/log
This copy is only meaningful for rebuilding the Keeper ensemble itself — not for restoring ClickHouse table state, which should always come from your data backup plus SYSTEM RESTORE REPLICA.
Recovering a Keeper Ensemble That Lost Quorum
If a Keeper cluster loses quorum (for example, a majority of nodes died) but at least one node still holds good state, you can force it back into a working single-node quorum and then re-grow the ensemble:
Identify the node with the most up-to-date state (highest
last_log_idxvialgif).Back up its
log_storage_pathandsnapshot_storage_pathdirectories first.Put that node into recovery mode — either send the
rcvrfour-letter command, or restart Keeper with--force-recovery:clickhouse-keeper --config /etc/clickhouse-keeper/keeper_config.xml --force-recoveryReconfigure the remaining nodes to point at the recovered node and start them one at a time, confirming each reaches follower status before adding the next.
Once a majority is online, normal quorum resumes and recovery mode can be cleared.
This restores the coordination layer in place. It does not fix individual replicated tables whose ZooKeeper paths were deleted — for that, use SYSTEM RESTORE REPLICA.
Recovering After Coordination State Is Lost
When the ZooKeeper/Keeper path for a replicated table is gone, the table goes read-only: data is still readable from disk, but inserts, merges, and DDL fail. Restarting a replica whose Keeper metadata is missing also attaches it as read-only. This is the situation SYSTEM RESTORE REPLICA is built for.
Modern recovery (ClickHouse 21.7+)
SYSTEM RESTORE REPLICA recreates a table's ZooKeeper metadata from the parts physically present on the local filesystem. It only operates on read-only tables, which is exactly the state a replica lands in after metadata loss.
-- Rebuild a single table's ZooKeeper metadata from its local parts:
SYSTEM RESTORE REPLICA my_db.my_table;
-- Restore every replicated table on the node at once:
SYSTEM RESTORE REPLICA ON CLUSTER my_cluster;
The typical full sequence after a coordination-layer wipe:
- Restore ClickHouse data from your backup (or confirm it is intact on disk).
- Recreate the database/table DDL if needed so the local metadata exists.
- Run
SYSTEM RESTORE REPLICAon one replica to repopulate ZooKeeper from its parts. - Run
SYSTEM RESTART REPLICA/SYSTEM SYNC REPLICAon the other replicas so they re-register against the freshly created path.
For a broader walkthrough of rebuilding a cluster from backups, see disaster recovery after data loss.
Legacy recovery (before 21.7)
On versions without SYSTEM RESTORE REPLICA, recovery is manual: detach the table, switch its engine from ReplicatedMergeTree to plain MergeTree in the .sql metadata file, reattach, rename it aside, recreate the replicated table fresh, then ATTACH PARTITION ... FROM the old table to move the parts back. Altinity maintains a script that automates this for many tables at once (Altinity/clickhouse-zookeeper-recovery). On any supported modern build you should prefer the single-command approach above.
Common Issues
- Snapshot copy treated as a table backup. Restoring an old Keeper snapshot does not restore your data and will fight with what is actually on disk. Always pair data backups with
SYSTEM RESTORE REPLICA. SYSTEM RESTORE REPLICArejected. The table must be read-only. If it is not, ClickHouse refuses the restore; you generally don't want to force this on a healthy replica.- Snapshots growing unbounded. A misconfigured or absent
snapshots_to_keep/ log purge fills the coordination disk. Confirm purging is active and watch Keeper coordination bottlenecks. - Verifying what's actually in Keeper. Use the techniques in check table metadata in ZooKeeper to inspect paths before and after recovery.
Best Practices
- Treat ClickHouse data backups as the source of truth. Coordination state is derived; rebuild it from data, don't snapshot-restore it.
- Run an odd-numbered, multi-node ensemble (3 or 5). Fault tolerance from quorum beats any backup for the common single-node failure.
- Take a fresh
csnpsnapshot before maintenance that touches the Keeper hosts, and keep a copy ofsnapshot_storage_path+log_storage_pathfor ensemble-level recovery. - Rehearse
SYSTEM RESTORE REPLICAon a staging cluster so the read-only recovery path is familiar before you need it in production. - Monitor coordination disk usage and snapshot cadence so purge settings and
snapshot_distanceare tuned to your write rate.
How Pulse Helps
Pulse monitors the health of your ZooKeeper/Keeper ensemble alongside your ClickHouse nodes — quorum status, snapshot and log growth, session churn, and the read-only replica conditions that signal lost coordination metadata. When a replica drops into read-only mode or the ensemble loses quorum, Pulse surfaces it early and points to the right recovery path (ensemble force-recovery vs. SYSTEM RESTORE REPLICA) instead of leaving you to diagnose a stalled cluster under pressure. Learn more at pulse.support.
Frequently Asked Questions
Q: Should I schedule regular backups of ZooKeeper or Keeper?
For recovery purposes, no — back up your ClickHouse data instead and rebuild coordination state with SYSTEM RESTORE REPLICA. A snapshot of the coordination state is stale immediately and inconsistent with the data on disk. The exception is keeping a copy of the Keeper snapshot and log directories specifically to rebuild a Keeper ensemble that has lost quorum.
Q: How do I create a Keeper snapshot on demand?
Send the csnp four-letter command (e.g. echo csnp | nc localhost 9181). It schedules a snapshot and returns the committed log index it will contain. Verify with lgif, checking that last_snapshot_idx advanced. Both commands must be in the Keeper four-letter allow list.
Q: My replicated tables are read-only after losing ZooKeeper. How do I fix it?
Confirm the data is present on disk, recreate the table DDL if necessary, then run SYSTEM RESTORE REPLICA my_db.my_table (or ON CLUSTER) on ClickHouse 21.7+. This rebuilds the table's ZooKeeper metadata from the local parts and brings the replica back to read-write. See disaster recovery after data loss.
Q: What's the difference between recovering the Keeper ensemble and recovering a table?
--force-recovery / the rcvr command restores the Keeper cluster to a working quorum when nodes have died. SYSTEM RESTORE REPLICA restores a table's coordination metadata when its ZooKeeper path was deleted but the cluster itself is healthy. You may need either or both depending on what failed.
Q: Can I migrate existing ZooKeeper data into Keeper instead of starting fresh?
Yes. The clickhouse-keeper-converter tool reads ZooKeeper's version-2 logs and snapshots and writes a Keeper snapshot. The resulting snapshot must be present on every Keeper node before any node starts, otherwise a node can elect itself leader with empty state. This is a migration path, not a backup/restore mechanism.
Q: Where are Keeper snapshots stored and how many are kept?
In snapshot_storage_path (logs in log_storage_path), both under <keeper_server>. Keeper writes a snapshot every snapshot_distance committed records (default 100000) and retains snapshots_to_keep of them (default 3), purging older ones automatically.