Migrating from ZooKeeper to ClickHouse Keeper

Q: Do I need to change my ClickHouse tables or clients?

No. Keeper speaks the ZooKeeper client protocol, so ReplicatedMergeTree tables, distributed DDL, and clients are unchanged. Only the connection block (hosts and ports) and the coordination service itself change.

Q: Can I run Keeper embedded in clickhouse-server instead of standalone?

Yes. Keeper runs either as a standalone clickhouse-keeper binary or embedded in clickhouse-server via the same block. Embedded is simpler operationally; standalone isolates coordination from query load. See What Is ClickHouse Keeper for the trade-offs.

Q: How do I roll back if something goes wrong?

Revert the block to the original ZooKeeper hosts, restart the ensemble from its preserved data directories, and restart ClickHouse. This only works cleanly if no writes were accepted against Keeper after cutover, which is why ingestion must stay stopped until validation passes.

Migrating from ZooKeeper to ClickHouse Keeper replaces the JVM-based coordination layer with a C++ service that speaks the same wire protocol, eliminating GC pauses and jute.maxbuffer ambiguity while keeping your ClickHouse clients unchanged. The migration itself is mechanical — convert the ZooKeeper data with clickhouse-keeper-converter, point ClickHouse at the new nodes, and validate — but it requires a maintenance window because ZooKeeper snapshots and the Keeper interserver protocol are not compatible, so the two cannot run side by side in one coordination cluster.

This guide covers pre-migration validation, the conversion procedure, post-migration verification, and rollback. For the architecture and standalone configuration of Keeper, see What Is ClickHouse Keeper. For sizing and operating ZooKeeper itself, see the ClickHouse ZooKeeper Configuration Guide.

Why Migrate

ClickHouse Keeper has been GA since version 22.3 and is the recommended coordination service for all new deployments. It uses the RAFT consensus protocol and speaks the ZooKeeper client protocol, so ReplicatedMergeTree, distributed DDL, and every existing client work unchanged after the switch.

Aspect	ZooKeeper	ClickHouse Keeper
Runtime	JVM	Native C++
Failure mode	GC pauses cause session expirations	No GC; no pause-driven expirations
Large payloads	Constrained by `jute.maxbuffer`	No equivalent hard limit ambiguity
Compression	None for snapshots/logs by default	Native snapshot and log compression
Deployment	Separate stack, separate binary	Standalone `clickhouse-keeper` or embedded in `clickhouse-server`
Wire protocol	ZooKeeper	ZooKeeper-compatible

If your motivation is recurring coordination latency or session drops rather than simply modernizing the stack, first confirm the bottleneck is ZooKeeper itself and not request pressure — see ClickHouse ZooKeeper and Keeper Coordination Bottlenecks. Keeper removes JVM-specific failure modes, but it does not absolve a cluster of excessive small-write load.

Pre-Migration Validation

Do not start until the cluster is in a known-good state. A migration that begins on top of an inconsistent or backlogged ZooKeeper will faithfully convert that inconsistency into Keeper.

Confirm ZooKeeper is healthy. Every node should answer imok, the ensemble should have a full set of synced followers, and latency should be normal.

echo ruok | nc zk1 2181        # expect "imok" on every node
echo mntr | nc zk1 2181 | grep -E "zk_followers|zk_synced_followers|zk_outstanding_requests"

Drain replication queues. Make sure no replica is far behind. A large backlog at cutover means a slow, risky resume.

SELECT database, table, type, count() AS pending
FROM system.replication_queue
GROUP BY database, table, type
ORDER BY pending DESC;

Confirm no read-only replicas. Any table flagged read-only indicates an existing coordination problem to resolve first.

SELECT database, table, is_readonly, absolute_delay, queue_size
FROM system.replicas
WHERE is_readonly OR queue_size > 0;

Record a baseline to compare against after cutover: counts of databases, tables, and replicas, plus a sample of znode metadata. The system.zookeeper table reads through whichever coordination service ClickHouse is connected to, so the same queries work before and after.
```
SELECT count() FROM system.zookeeper WHERE path = '/clickhouse/tables';
SELECT count() FROM system.replicas;
```
Note the ZooKeeper version. The converter requires ZooKeeper 3.4 or later. Verify the on-disk layout exposes a version-2 directory under your dataDir/dataLogDir, which is where logs and snapshots live.

The Migration Procedure

The conversion reads ZooKeeper's on-disk logs and snapshots and writes a single ClickHouse Keeper snapshot. Because the formats are incompatible, this offline step is mandatory — there is no live, rolling migration path.

1. Stop writes and background work

On every ClickHouse node, stop ingestion at the application layer, then halt background tasks that mutate coordination state:

SYSTEM STOP MERGES;
SYSTEM STOP FETCHES;
SYSTEM STOP REPLICATED SENDS;
SYSTEM STOP DISTRIBUTED SENDS;

The goal is a quiescent ZooKeeper: no new znodes, no queue churn. Any write that lands after you snapshot ZooKeeper but before cutover will be lost.

2. Stop ZooKeeper and force a consistent snapshot

Stop all ZooKeeper nodes. The official guidance is to then optionally find the ZooKeeper leader, start and stop it once more so it writes a fresh, fully consistent snapshot:

# On each ZooKeeper node
systemctl stop zookeeper

# Optional but recommended: restart the former leader to flush a clean snapshot,
# then stop it again.

3. Run the converter

Run clickhouse-keeper-converter on the node that holds the most complete data (the former leader). Point it at the ZooKeeper version-2 directories and an output directory for the Keeper snapshot:

clickhouse-keeper-converter \
  --zookeeper-logs-dir /var/lib/zookeeper/version-2 \
  --zookeeper-snapshots-dir /var/lib/zookeeper/version-2 \
  --output-dir /var/lib/clickhouse/coordination/snapshots

If you only have the combined ClickHouse binary, the equivalent is clickhouse keeper-converter with the same flags. The output is a single Keeper snapshot file (for example snapshot_<N>.bin).

4. Distribute the snapshot to every Keeper node

Copy the generated snapshot into the snapshot_storage_path of every Keeper node before starting any of them:

scp /var/lib/clickhouse/coordination/snapshots/snapshot_*.bin \
  keeper2:/var/lib/clickhouse/coordination/snapshots/

This step is not optional and is the most common source of data loss. If one node starts empty while others have the snapshot, the empty node can win leader election faster and the cluster will adopt an empty dataset.

5. Configure Keeper

Add a <keeper_server> block on each Keeper node, with a unique server_id per node and an identical raft_configuration listing every member:

<keeper_server>
    <tcp_port>9181</tcp_port>
    <server_id>1</server_id>
    <log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
    <snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>

    <coordination_settings>
        <operation_timeout_ms>10000</operation_timeout_ms>
        <session_timeout_ms>30000</session_timeout_ms>
    </coordination_settings>

    <raft_configuration>
        <server>
            <id>1</id>
            <hostname>keeper1</hostname>
            <port>9234</port>
        </server>
        <server>
            <id>2</id>
            <hostname>keeper2</hostname>
            <port>9234</port>
        </server>
        <server>
            <id>3</id>
            <hostname>keeper3</hostname>
            <port>9234</port>
        </server>
    </raft_configuration>
</keeper_server>

6. Point ClickHouse at Keeper

Update the <zookeeper> block in every clickhouse-server config to reference the Keeper nodes. The element is still called <zookeeper> — that is correct, it is the client-side coordination config. Use Keeper's client port 9181 (not ZooKeeper's 2181):

<zookeeper>
    <node>
        <host>keeper1</host>
        <port>9181</port>
    </node>
    <node>
        <host>keeper2</host>
        <port>9181</port>
    </node>
    <node>
        <host>keeper3</host>
        <port>9181</port>
    </node>
</zookeeper>

7. Start Keeper, then ClickHouse

Start Keeper on all nodes, confirm a leader is elected, then restart clickhouse-server on every node so it reconnects to the new coordination layer.

Post-Migration Validation

Verify the cluster matches the baseline before resuming traffic.

Confirm Keeper is up and a leader exists:

echo ruok | nc keeper1 9181        # expect "imok"
echo mntr | nc keeper1 9181 | grep -E "zk_server_state|zk_followers|zk_synced_followers"
echo srvr | nc keeper1 9181        # shows Mode: leader / follower

Keeper supports the same four-letter-word commands as ZooKeeper; enable them with <four_letter_word_white_list> in the Keeper config if they are restricted.

Confirm metadata converted intact by comparing against the baseline:

SELECT count() FROM system.zookeeper WHERE path = '/clickhouse/tables';
SELECT count() FROM system.replicas;

Confirm no replica came up read-only:

SELECT database, table, is_readonly, is_session_expired, absolute_delay
FROM system.replicas
WHERE is_readonly OR is_session_expired;

Resume background work and ingestion once validation passes:

SYSTEM START MERGES;
SYSTEM START FETCHES;
SYSTEM START REPLICATED SENDS;
SYSTEM START DISTRIBUTED SENDS;

Watch system.replication_queue drain after resuming, and confirm a test insert replicates across all replicas.

Rollback

Keep ZooKeeper recoverable until you are confident in Keeper. The clean rollback is: stop ClickHouse and Keeper, revert the <zookeeper> block to the original ZooKeeper hosts and ports, restart the ZooKeeper ensemble from its preserved data directories, and restart ClickHouse. Because the converter only reads ZooKeeper data, the original ensemble is untouched and remains a valid fallback — provided no writes were accepted against Keeper after cutover.

This is why steps 1 and 2 matter: any write committed to Keeper after the snapshot diverges from ZooKeeper, and rolling back then means losing that write. Do not delete ZooKeeper data directories or decommission the ensemble until the cluster has run cleanly on Keeper through a full operational cycle.

Best Practices

Migrate a healthy cluster, not a burning one. If you are mid-incident with session expirations, stabilize first. Converting a broken ZooKeeper produces a broken Keeper.
Snapshot every Keeper node before any node starts. Skipping this on even one node risks an empty-dataset leader election.
Keep ZooKeeper's data directories until Keeper has proven itself; they are your rollback.
Migrate in a true maintenance window. Ingestion and merges must be stopped — there is no rolling migration.
Test the procedure in staging first, ideally on a copy of production's ZooKeeper data, so the converter and config changes are rehearsed.
Use a dedicated, fast disk for Keeper's log_storage_path, the same way you would for ZooKeeper's transaction log.

Common Issues

Keeper starts with an empty dataset. The snapshot was missing on one or more nodes, or was never copied. Stop the cluster, place the converted snapshot on every node, and restart.
Mixed ZooKeeper/Keeper cluster. Not supported. The interserver protocols are incompatible; you cannot grow a quorum that contains both.
Replicas come up read-only after cutover. Usually a misconfigured <zookeeper> block — wrong host, or port 2181 left in place instead of Keeper's 9181. Fix the config and restart.
Lost recent writes. Writes landed on ZooKeeper between the snapshot and cutover, or on Keeper before validation. Both are prevented by fully stopping ingestion and background tasks in step 1.
Cannot create new ZooKeeper session after migration. Keeper is unreachable or no leader was elected. See Cannot Create New ZooKeeper Session.

How Pulse Helps

Coordination migrations are high-stakes precisely because the failure modes — an empty-snapshot leader, a lost write, a stuck replication queue — are silent until traffic resumes. Pulse monitors ClickHouse and its coordination layer continuously, so the pre-migration baseline (replication lag, read-only replicas, queue depth, Keeper/ZooKeeper latency) is already captured, and post-migration drift surfaces immediately rather than during the next incident. For teams running ZooKeeper-to-Keeper migrations across many clusters, Pulse flags the unhealthy ensembles that should be stabilized before any cutover and confirms each migrated cluster converges cleanly afterward.

Frequently Asked Questions

Q: Can I migrate from ZooKeeper to Keeper without downtime?

No. The snapshot and interserver protocols are incompatible, so the two cannot coexist in one coordination cluster and there is no rolling path. You need a maintenance window with ingestion and background tasks stopped.

Q: Do I need to change my ClickHouse tables or clients?

No. Keeper speaks the ZooKeeper client protocol, so ReplicatedMergeTree tables, distributed DDL, and clients are unchanged. Only the <zookeeper> connection block (hosts and ports) and the coordination service itself change.

Q: Why is the config block still called <zookeeper> after migrating to Keeper?

That block is the client-side coordination configuration. ClickHouse keeps the name for compatibility regardless of whether it points at ZooKeeper or Keeper. Just update the hosts and use Keeper's port 9181.

Q: What happens if I forget to copy the snapshot to every Keeper node?

A node that starts without the snapshot has an empty dataset and may win leader election faster than nodes loading the snapshot, causing the cluster to adopt empty coordination state. Always place the converted snapshot on every node before starting any of them.

Q: Can I run Keeper embedded in clickhouse-server instead of standalone?

Yes. Keeper runs either as a standalone clickhouse-keeper binary or embedded in clickhouse-server via the same <keeper_server> block. Embedded is simpler operationally; standalone isolates coordination from query load. See What Is ClickHouse Keeper for the trade-offs.

Q: How do I roll back if something goes wrong?

Revert the <zookeeper> block to the original ZooKeeper hosts, restart the ensemble from its preserved data directories, and restart ClickHouse. This only works cleanly if no writes were accepted against Keeper after cutover, which is why ingestion must stay stopped until validation passes.