NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Inconsistent cluster definition

The "DB::Exception: Inconsistent cluster definition" error in ClickHouse occurs when the cluster configuration on one node does not match the configuration on another node in the same cluster. The error code is INCONSISTENT_CLUSTER_DEFINITION. ClickHouse expects all nodes participating in a cluster to share an identical view of the cluster topology -- same shards, same replicas, same ordering. When this contract is violated, the error is raised.

Impact

An inconsistent cluster definition can lead to:

  • Failed distributed queries that detect the mismatch
  • Incorrect query routing where data is sent to the wrong shards
  • Silent data corruption if inserts land on unexpected shards due to mismatched shard numbering
  • Operational confusion during troubleshooting, as different nodes report different cluster topologies

Common Causes

  1. Manual configuration edits applied unevenly -- An operator updated the cluster config on some nodes but forgot others.
  2. Configuration management drift -- Ansible, Puppet, or other tools failed to apply changes uniformly across all nodes.
  3. Rolling config updates -- During a phased rollout of a new cluster definition, some nodes have the old config while others have the new one.
  4. Copy-paste errors -- Slightly different host names, ports, or shard/replica ordering between nodes.
  5. Dynamic cluster configuration out of sync -- If using ClickHouse Keeper-based cluster discovery, transient states during reconfiguration can cause temporary inconsistencies.
  6. Include file differences -- The main config references include files that differ between nodes.

Troubleshooting and Resolution Steps

  1. Compare cluster definitions across all nodes. Run this query on every node in the cluster and compare the output:

    SELECT
        cluster,
        shard_num,
        shard_weight,
        replica_num,
        host_name,
        host_address,
        port,
        is_local
    FROM system.clusters
    WHERE cluster = 'your_cluster'
    ORDER BY shard_num, replica_num;
    

    Any differences between nodes indicate the problem.

  2. Inspect the configuration files directly on each node:

    # Check the remote_servers section
    grep -A 100 '<remote_servers>' /etc/clickhouse-server/config.xml
    # Or in config.d/ includes
    cat /etc/clickhouse-server/config.d/cluster.xml
    

    Diff the files across nodes:

    diff <(ssh node1 cat /etc/clickhouse-server/config.d/cluster.xml) \
         <(ssh node2 cat /etc/clickhouse-server/config.d/cluster.xml)
    
  3. Fix the inconsistency by deploying a uniform configuration to all nodes. Use your configuration management tool or manually copy the correct file:

    # Using ansible as an example
    ansible clickhouse_nodes -m copy -a "src=cluster.xml dest=/etc/clickhouse-server/config.d/cluster.xml"
    
  4. Reload the configuration without restarting ClickHouse:

    SYSTEM RELOAD CONFIG;
    

    Run this on every node after deploying the corrected configuration.

  5. Verify the fix by re-running the cluster query on all nodes and confirming identical output.

  6. If using ClickHouse Keeper for cluster discovery, check the Keeper state:

    SELECT * FROM system.zookeeper WHERE path = '/clickhouse/cluster_config/';
    

    Ensure all nodes are reading from the same Keeper path and the data is consistent.

Best Practices

  • Use a single source of truth for cluster configuration and deploy it to all nodes through automated configuration management.
  • Include cluster configuration validation in your CI/CD pipeline -- diff the config across nodes as part of deployment checks.
  • Avoid manual edits to cluster config files on individual nodes. All changes should go through the standard deployment process.
  • When making cluster topology changes, apply them atomically to all nodes or use a maintenance window where distributed queries are paused.
  • Consider using ClickHouse Keeper-based automatic cluster discovery to reduce configuration drift.
  • Version-control your ClickHouse configuration files to track when and why changes were made.

Frequently Asked Questions

Q: Does the order of shards and replicas in the config matter?
A: Yes. ClickHouse assigns shard numbers based on the order they appear in the configuration. If the order differs between nodes, shard routing will be inconsistent, potentially sending data to the wrong shards.

Q: Can I temporarily have different configs during a rolling update?
A: Brief periods of inconsistency during rolling updates are common but risky. Distributed queries executed during this window may fail or produce incorrect results. Minimize the window by applying changes as quickly as possible across all nodes.

Q: Will SYSTEM RELOAD CONFIG fix the issue without a restart?
A: Yes, for cluster configuration changes in the remote_servers section. ClickHouse supports hot-reloading this part of the config. However, some configuration changes may require a full restart.

Q: How does this error interact with distributed DDL?
A: Distributed DDL (ON CLUSTER) relies on a consistent cluster definition to know which nodes to target. An inconsistent definition can cause DDL to be applied to the wrong set of nodes or fail entirely.

Q: Can macros cause this error?
A: Indirectly, yes. If the cluster config uses macros (e.g., {replica}) and the macro values differ unexpectedly between nodes, the resolved configuration will be inconsistent even though the template is identical.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.