ClickHouse DB::Exception: Keeper communication failure (Code 999)

Q: My replicated tables are stuck in read-only mode. How do I fix this?

First, ensure Keeper is healthy and reachable. Then run SYSTEM RESTART REPLICAS on the affected ClickHouse node to re-establish sessions. If the problem persists, check the ClickHouse server logs for specific Keeper errors.

The "DB::Exception: Keeper exception" error in ClickHouse is a general error indicating a failure in communication with or operations on ZooKeeper or ClickHouse Keeper. The error code is KEEPER_EXCEPTION. Since ZooKeeper/Keeper is the coordination backbone for replicated tables, distributed DDL, and other cluster operations, this error can surface in many different contexts.

Impact

The impact depends on the operation that triggered the error. It can affect INSERT operations on replicated tables, DDL operations (CREATE, ALTER, DROP), replication synchronization, and leader election. In severe cases, replicated tables may go into read-only mode, new inserts may be blocked, and schema changes may fail cluster-wide. The data already stored remains safe, but new operations are disrupted until the Keeper connectivity issue is resolved.

Common Causes

Keeper/ZooKeeper unavailability -- The Keeper cluster is down, unreachable, or in a degraded state (e.g., lost quorum).
Network connectivity issues -- Firewall rules, DNS resolution failures, or network partitions between ClickHouse and Keeper nodes.
Session timeout -- The ClickHouse session with Keeper expired due to GC pauses, high load, or network latency.
Keeper data corruption or disk full -- The Keeper data directory ran out of space or the transaction log is corrupted.
Too many nodes/watches -- The ZooKeeper/Keeper instance has too many znodes or watches, causing performance degradation and timeouts.
Version incompatibility -- Mismatch between ClickHouse's expected Keeper protocol version and the actual Keeper version.
Concurrent DDL overload -- Too many simultaneous DDL operations exhausting Keeper resources.

Troubleshooting and Resolution Steps

Check Keeper/ZooKeeper health:

# For ClickHouse Keeper
echo ruok | nc keeper-host 9181

# For ZooKeeper
echo ruok | nc zookeeper-host 2181

# Should return "imok"

Verify connectivity from the ClickHouse server:

# Test network connectivity
nc -zv keeper-host 9181

# Check DNS resolution
dig keeper-host

Check Keeper status from ClickHouse:

SELECT * FROM system.zookeeper WHERE path = '/';

-- Check replication health
SELECT database, table, zookeeper_path, is_readonly, is_session_expired
FROM system.replicas
WHERE is_readonly OR is_session_expired;

Review Keeper/ZooKeeper logs for errors:

# ClickHouse Keeper logs
grep -i "error\|exception\|timeout" /var/log/clickhouse-keeper/clickhouse-keeper.log

# ZooKeeper logs
grep -i "error\|exception" /var/log/zookeeper/zookeeper.log

Check Keeper disk space and data size:

# Check disk space on Keeper nodes
df -h /var/lib/clickhouse-keeper/

# Check znode count (for ZooKeeper)
echo mntr | nc zookeeper-host 2181 | grep zk_znode_count

If tables are in read-only mode, restart the ClickHouse session to Keeper:
```
-- Force re-establish Keeper sessions
SYSTEM RESTART REPLICAS;
```

If the Keeper cluster lost quorum, restore it by ensuring a majority of nodes are running:

# Check Keeper cluster status
echo mntr | nc keeper-host 9181 | grep zk_server_state
# Should show "leader" on one node and "follower" on others

Tune session and operation timeouts if transient timeouts are common:

<!-- In ClickHouse config -->
<zookeeper>
    <session_timeout_ms>30000</session_timeout_ms>
    <operation_timeout_ms>10000</operation_timeout_ms>
    <node>
        <host>keeper-host</host>
        <port>9181</port>
    </node>
</zookeeper>

Best Practices

Deploy Keeper/ZooKeeper with an odd number of nodes (3 or 5) to maintain quorum tolerance.
Monitor Keeper latency, session count, and znode count. Set up alerts for high latency or lost quorum.
Keep Keeper nodes on dedicated hardware or instances, not co-located with heavy ClickHouse workloads.
Ensure sufficient disk space on Keeper nodes and configure snapshot/log cleanup.
Use ClickHouse Keeper instead of ZooKeeper for new deployments, as it is purpose-built for ClickHouse and easier to operate.
Avoid creating excessive znodes -- clean up old table paths when dropping tables using SYSTEM DROP REPLICA or manual cleanup.
Set appropriate timeouts based on your network characteristics.

Frequently Asked Questions

Q: What is the difference between ZooKeeper and ClickHouse Keeper?
A: ClickHouse Keeper is a drop-in replacement for Apache ZooKeeper, written in C++ and included with ClickHouse. It implements the same protocol but is optimized for ClickHouse's coordination patterns. It is generally recommended for new deployments due to simpler operations and better performance for ClickHouse workloads.

Q: My replicated tables are stuck in read-only mode. How do I fix this?
A: First, ensure Keeper is healthy and reachable. Then run SYSTEM RESTART REPLICAS on the affected ClickHouse node to re-establish sessions. If the problem persists, check the ClickHouse server logs for specific Keeper errors.

Q: How many znodes is too many?
A: There is no strict limit, but performance degrades as znode count grows into the millions. Keep the count under a few hundred thousand if possible. Each ClickHouse table creates multiple znodes, and old replica paths can accumulate over time.

Q: Can ClickHouse work without ZooKeeper/Keeper?
A: Non-replicated tables (regular MergeTree) do not require Keeper. Only ReplicatedMergeTree tables, distributed DDL, and certain cluster features require Keeper coordination. If you don't use replication, you don't need Keeper.