ClickHouse DB::Exception: Not a leader (Code: 529)

Q: Does `ON CLUSTER` always route to the leader?

ON CLUSTER distributes the DDL to all nodes in the cluster via the distributed DDL queue. The leader on each shard will process the operation for replicated tables.

The "DB::Exception: Not a leader" error in ClickHouse historically occurred when a DDL (Data Definition Language) operation such as OPTIMIZE or ALTER ... PARTITION was sent to a ReplicatedMergeTree replica that was not the elected leader for that table. This corresponds to ClickHouse error code NOT_A_LEADER (error code 529).

This error is largely obsolete on modern ClickHouse. Single-leader election was removed in stages: support for multiple simultaneous leaders landed in version 20.6, and the leader-election mechanism was removed entirely in version 22.3. On these versions every replica can assign merges, mutations, and partition operations, so operations that previously had to be routed to "the leader" now succeed on any replica and NOT_A_LEADER is effectively never thrown. If you encounter it, you are almost certainly running a ClickHouse release older than 20.6.

Impact

The DDL operation fails on the non-leader replica. The table continues to function normally for reads and writes, but the specific administrative operation cannot proceed until it is directed to the leader replica or the leadership is re-established.

Common Causes

DDL sent to wrong replica — the client or application connected to a replica that is not the current leader for the target table.
Leader election in progress — the previous leader went down and a new leader has not yet been elected via ZooKeeper/ClickHouse Keeper.
ZooKeeper/Keeper connectivity issues — the replica cannot communicate with ZooKeeper to confirm or acquire leadership.
Split-brain scenario — network partitioning has caused the cluster to lose quorum, preventing leader election.
Read-only replica — the replica has entered read-only mode due to ZooKeeper session loss or other issues.

Troubleshooting and Resolution Steps

Identify the current leader for the table:

SELECT database, table, replica_name, is_leader, is_readonly
FROM system.replicas
WHERE database = 'your_database' AND table = 'your_table';

Direct the DDL operation to the leader replica. Connect to the node that shows is_leader = 1 and run the operation there.
If no leader is elected, check ZooKeeper/Keeper connectivity:
```
SELECT * FROM system.zookeeper WHERE path = '/';
```
If this query fails, the node has lost its ZooKeeper session.

Check ZooKeeper/Keeper health:

echo ruok | nc zookeeper-host 2181
echo mntr | nc zookeeper-host 2181

Wait for leader election if ZooKeeper is healthy. Leader election typically completes within seconds after the previous leader goes down. Monitor:
```
SELECT database, table, is_leader FROM system.replicas
WHERE database = 'your_database' AND table = 'your_table';
```
Restart the ClickHouse node if it is stuck in a state where it cannot acquire leadership despite ZooKeeper being healthy:
```
sudo systemctl restart clickhouse-server
```

Use ON CLUSTER syntax to let ClickHouse route the DDL to the appropriate node:

ALTER TABLE your_database.your_table ON CLUSTER your_cluster
    ADD COLUMN new_col String;

Best Practices

Use ON CLUSTER for DDL operations to avoid manually identifying the leader.
Monitor replica leadership status using system.replicas as part of your cluster health checks.
Ensure reliable ZooKeeper/Keeper connectivity with redundant ZooKeeper nodes (at least 3 for quorum).
Set up alerting for replicas that lose leadership or enter read-only mode.
Avoid connecting to a specific replica for DDL operations; use a load balancer or distributed DDL instead.

Frequently Asked Questions

Q: What operations required the leader replica?
A: On pre-20.6 versions, operations that assign background work such as OPTIMIZE TABLE and ALTER ... PARTITION (e.g. DROP/ATTACH PARTITION) had to be initiated on the leader replica. Since 20.6 any replica can perform them, so this restriction no longer applies. Regular INSERT and SELECT operations were always allowed on any replica.

Q: Can there be multiple leaders for the same table?
A: Yes, on modern ClickHouse. Since version 20.6 multiple replicas can be leaders at the same time, and the dedicated leader-election mechanism was removed entirely in 22.3. The is_leader column in system.replicas may therefore be 1 on several replicas. Only very old (pre-20.6) versions had a single elected leader, which is why NOT_A_LEADER no longer occurs in practice.

Q: What happens if the leader goes down permanently?
A: On modern ClickHouse there is no single leader to replace, so the remaining replicas simply continue assigning merges and mutations. On pre-20.6 versions a new leader was automatically elected among the remaining replicas as long as ZooKeeper quorum was maintained, typically within seconds.

Q: Does ON CLUSTER always route to the leader?
A: ON CLUSTER distributes the DDL to all nodes in the cluster via the distributed DDL queue. The leader on each shard will process the operation for replicated tables.