ClickHouse DB::Exception: Inconsistent cluster definition

Q: Will SYSTEM RELOAD CONFIG fix the issue without a restart?

Yes, for cluster configuration changes in the remote_servers section. ClickHouse supports hot-reloading this part of the config. However, some configuration changes may require a full restart.

Q: Can macros cause this error?

Indirectly, yes. If the cluster config uses macros (e.g., {replica} ) and the macro values differ unexpectedly between nodes, the resolved configuration will be inconsistent even though the template is identical.

The "DB::Exception: Inconsistent cluster definition" error in ClickHouse occurs when the cluster configuration on one node does not match the configuration on another node in the same cluster. The error code is INCONSISTENT_CLUSTER_DEFINITION. ClickHouse expects all nodes participating in a cluster to share an identical view of the cluster topology -- same shards, same replicas, same ordering. When this contract is violated, the error is raised.

Impact

An inconsistent cluster definition can lead to:

Failed distributed queries that detect the mismatch
Incorrect query routing where data is sent to the wrong shards
Silent data corruption if inserts land on unexpected shards due to mismatched shard numbering
Operational confusion during troubleshooting, as different nodes report different cluster topologies

Common Causes

Manual configuration edits applied unevenly -- An operator updated the cluster config on some nodes but forgot others.
Configuration management drift -- Ansible, Puppet, or other tools failed to apply changes uniformly across all nodes.
Rolling config updates -- During a phased rollout of a new cluster definition, some nodes have the old config while others have the new one.
Copy-paste errors -- Slightly different host names, ports, or shard/replica ordering between nodes.
Dynamic cluster configuration out of sync -- If using ClickHouse Keeper-based cluster discovery, transient states during reconfiguration can cause temporary inconsistencies.
Include file differences -- The main config references include files that differ between nodes.

Troubleshooting and Resolution Steps

Compare cluster definitions across all nodes. Run this query on every node in the cluster and compare the output:

SELECT
    cluster,
    shard_num,
    shard_weight,
    replica_num,
    host_name,
    host_address,
    port,
    is_local
FROM system.clusters
WHERE cluster = 'your_cluster'
ORDER BY shard_num, replica_num;

Any differences between nodes indicate the problem.

Inspect the configuration files directly on each node:

# Check the remote_servers section
grep -A 100 '<remote_servers>' /etc/clickhouse-server/config.xml
# Or in config.d/ includes
cat /etc/clickhouse-server/config.d/cluster.xml

Diff the files across nodes:

diff <(ssh node1 cat /etc/clickhouse-server/config.d/cluster.xml) \
     <(ssh node2 cat /etc/clickhouse-server/config.d/cluster.xml)

Fix the inconsistency by deploying a uniform configuration to all nodes. Use your configuration management tool or manually copy the correct file:
```
# Using ansible as an example
ansible clickhouse_nodes -m copy -a "src=cluster.xml dest=/etc/clickhouse-server/config.d/cluster.xml"
```
Reload the configuration without restarting ClickHouse:
```
SYSTEM RELOAD CONFIG;
```
Run this on every node after deploying the corrected configuration.
Verify the fix by re-running the cluster query on all nodes and confirming identical output.
If using ClickHouse Keeper for cluster discovery, check the Keeper state:
```
SELECT * FROM system.zookeeper WHERE path = '/clickhouse/cluster_config/';
```
Ensure all nodes are reading from the same Keeper path and the data is consistent.

Best Practices

Use a single source of truth for cluster configuration and deploy it to all nodes through automated configuration management.
Include cluster configuration validation in your CI/CD pipeline -- diff the config across nodes as part of deployment checks.
Avoid manual edits to cluster config files on individual nodes. All changes should go through the standard deployment process.
When making cluster topology changes, apply them atomically to all nodes or use a maintenance window where distributed queries are paused.
Consider using ClickHouse Keeper-based automatic cluster discovery to reduce configuration drift.
Version-control your ClickHouse configuration files to track when and why changes were made.

Frequently Asked Questions

Q: Does the order of shards and replicas in the config matter?
A: Yes. ClickHouse assigns shard numbers based on the order they appear in the configuration. If the order differs between nodes, shard routing will be inconsistent, potentially sending data to the wrong shards.

Q: Can I temporarily have different configs during a rolling update?
A: Brief periods of inconsistency during rolling updates are common but risky. Distributed queries executed during this window may fail or produce incorrect results. Minimize the window by applying changes as quickly as possible across all nodes.

Q: Will SYSTEM RELOAD CONFIG fix the issue without a restart?
A: Yes, for cluster configuration changes in the remote_servers section. ClickHouse supports hot-reloading this part of the config. However, some configuration changes may require a full restart.

Q: How does this error interact with distributed DDL?
A: Distributed DDL (ON CLUSTER) relies on a consistent cluster definition to know which nodes to target. An inconsistent definition can cause DDL to be applied to the wrong set of nodes or fail entirely.

Q: Can macros cause this error?
A: Indirectly, yes. If the cluster config uses macros (e.g., {replica}) and the macro values differ unexpectedly between nodes, the resolved configuration will be inconsistent even though the template is identical.