ClickHouse DB::Exception: Keeper communication failure (Code 999)

The "DB::Exception: Keeper exception" error in ClickHouse is a general error indicating a failure in communication with or operations on ZooKeeper or ClickHouse Keeper. The error code is KEEPER_EXCEPTION. Since ZooKeeper/Keeper is the coordination backbone for replicated tables, distributed DDL, and other cluster operations, this error can surface in many different contexts.

Impact

The impact depends on the operation that triggered the error. It can affect INSERT operations on replicated tables, DDL operations (CREATE, ALTER, DROP), replication synchronization, and leader election. In severe cases, replicated tables may go into read-only mode, new inserts may be blocked, and schema changes may fail cluster-wide. The data already stored remains safe, but new operations are disrupted until the Keeper connectivity issue is resolved.

Common Causes

  1. Keeper/ZooKeeper unavailability -- The Keeper cluster is down, unreachable, or in a degraded state (e.g., lost quorum).
  2. Network connectivity issues -- Firewall rules, DNS resolution failures, or network partitions between ClickHouse and Keeper nodes.
  3. Session timeout -- The ClickHouse session with Keeper expired due to GC pauses, high load, or network latency.
  4. Keeper data corruption or disk full -- The Keeper data directory ran out of space or the transaction log is corrupted.
  5. Too many nodes/watches -- The ZooKeeper/Keeper instance has too many znodes or watches, causing performance degradation and timeouts.
  6. Version incompatibility -- Mismatch between ClickHouse's expected Keeper protocol version and the actual Keeper version.
  7. Concurrent DDL overload -- Too many simultaneous DDL operations exhausting Keeper resources.

Troubleshooting and Resolution Steps

  1. Check Keeper/ZooKeeper health:

    # For ClickHouse Keeper
    echo ruok | nc keeper-host 9181
    
    # For ZooKeeper
    echo ruok | nc zookeeper-host 2181
    
    # Should return "imok"
    
  2. Verify connectivity from the ClickHouse server:

    # Test network connectivity
    nc -zv keeper-host 9181
    
    # Check DNS resolution
    dig keeper-host
    
  3. Check Keeper status from ClickHouse:

    SELECT * FROM system.zookeeper WHERE path = '/';
    
    -- Check replication health
    SELECT database, table, zookeeper_path, is_readonly, is_session_expired
    FROM system.replicas
    WHERE is_readonly OR is_session_expired;
    
  4. Review Keeper/ZooKeeper logs for errors:

    # ClickHouse Keeper logs
    grep -i "error\|exception\|timeout" /var/log/clickhouse-keeper/clickhouse-keeper.log
    
    # ZooKeeper logs
    grep -i "error\|exception" /var/log/zookeeper/zookeeper.log
    
  5. Check Keeper disk space and data size:

    # Check disk space on Keeper nodes
    df -h /var/lib/clickhouse-keeper/
    
    # Check znode count (for ZooKeeper)
    echo mntr | nc zookeeper-host 2181 | grep zk_znode_count
    
  6. If tables are in read-only mode, restart the ClickHouse session to Keeper:

    -- Force re-establish Keeper sessions
    SYSTEM RESTART REPLICAS;
    
  7. If the Keeper cluster lost quorum, restore it by ensuring a majority of nodes are running:

    # Check Keeper cluster status
    echo mntr | nc keeper-host 9181 | grep zk_server_state
    # Should show "leader" on one node and "follower" on others
    
  8. Tune session and operation timeouts if transient timeouts are common:

    <!-- In ClickHouse config -->
    <zookeeper>
        <session_timeout_ms>30000</session_timeout_ms>
        <operation_timeout_ms>10000</operation_timeout_ms>
        <node>
            <host>keeper-host</host>
            <port>9181</port>
        </node>
    </zookeeper>
    

Best Practices

  • Deploy Keeper/ZooKeeper with an odd number of nodes (3 or 5) to maintain quorum tolerance.
  • Monitor Keeper latency, session count, and znode count. Set up alerts for high latency or lost quorum.
  • Keep Keeper nodes on dedicated hardware or instances, not co-located with heavy ClickHouse workloads.
  • Ensure sufficient disk space on Keeper nodes and configure snapshot/log cleanup.
  • Use ClickHouse Keeper instead of ZooKeeper for new deployments, as it is purpose-built for ClickHouse and easier to operate.
  • Avoid creating excessive znodes -- clean up old table paths when dropping tables using SYSTEM DROP REPLICA or manual cleanup.
  • Set appropriate timeouts based on your network characteristics.

Frequently Asked Questions

Q: What is the difference between ZooKeeper and ClickHouse Keeper?
A: ClickHouse Keeper is a drop-in replacement for Apache ZooKeeper, written in C++ and included with ClickHouse. It implements the same protocol but is optimized for ClickHouse's coordination patterns. It is generally recommended for new deployments due to simpler operations and better performance for ClickHouse workloads.

Q: My replicated tables are stuck in read-only mode. How do I fix this?
A: First, ensure Keeper is healthy and reachable. Then run SYSTEM RESTART REPLICAS on the affected ClickHouse node to re-establish sessions. If the problem persists, check the ClickHouse server logs for specific Keeper errors.

Q: How many znodes is too many?
A: There is no strict limit, but performance degrades as znode count grows into the millions. Keep the count under a few hundred thousand if possible. Each ClickHouse table creates multiple znodes, and old replica paths can accumulate over time.

Q: Can ClickHouse work without ZooKeeper/Keeper?
A: Non-replicated tables (regular MergeTree) do not require Keeper. Only ReplicatedMergeTree tables, distributed DDL, and certain cluster features require Keeper coordination. If you don't use replication, you don't need Keeper.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.