ClickHouse DB::Exception: Replica is already active

When ClickHouse raises the DB::Exception: Replica is already active error (code REPLICA_IS_ALREADY_ACTIVE), it means the server attempted to register a replica in ZooKeeper, but a node at that replica path already holds an active ephemeral lock. This typically happens after an unclean restart or when two ClickHouse instances are configured with the same replica identity.

Impact

This error blocks the affected replica from starting replication for the table. While it persists, the node will not pull new data parts, will not participate in quorum writes, and may fall behind the rest of the cluster. Queries routed to this replica could return stale results if the table was previously populated.

Common Causes

Unclean server shutdown -- the previous ClickHouse process terminated without closing its ZooKeeper session, so the ephemeral node has not yet expired.
Duplicate replica paths -- two distinct ClickHouse instances share the same ReplicatedMergeTree replica name and ZooKeeper path in their table definitions.
ZooKeeper session timeout not elapsed -- after a crash, the old session is still considered alive because the session timeout (commonly 10-30 seconds) has not passed.
Container or VM snapshot restored -- a cloned environment retains the original replica identity, causing a conflict with the still-running source node.

Troubleshooting and Resolution Steps

Wait for the old session to expire If the previous ClickHouse process truly is gone, ZooKeeper will clean up the ephemeral node once the session timeout elapses. Simply retry after 30-60 seconds:
```
SYSTEM RESTART REPLICA db.my_replicated_table;
```
Verify no other process is using the same replica path Check for duplicate replica identifiers across your fleet:
```
SELECT database, table, replica_name, replica_path
FROM system.replicas
WHERE active_replicas > 0;
```
Ensure each ClickHouse instance has a unique {replica} macro in its configuration.
Inspect the ZooKeeper node directly Use the ZooKeeper CLI or ClickHouse's system.zookeeper table to look at the replica's is_active node:
```
SELECT name, value, ephemeralOwner
FROM system.zookeeper
WHERE path = '/clickhouse/tables/01/my_table/replicas/replica1';
```
If ephemeralOwner is non-zero, an active session still holds the lock.
Force-drop the stale lock (use with caution) If you are certain the old process is dead and cannot wait for the timeout:
```
# Using zkCli
deleteall /clickhouse/tables/01/my_table/replicas/replica1/is_active
```
Then restart replication on the ClickHouse node.
Review and fix macros configuration In /etc/clickhouse-server/config.d/macros.xml, confirm that {replica} resolves to a value unique to each server:
```
<macros>
    <shard>01</shard>
    <replica>clickhouse-node-1</replica>
</macros>
```

Best Practices

Always assign unique replica names via the {replica} macro, ideally derived from the hostname or a stable identifier.
Use SYSTEM RESTART REPLICA rather than manual ZooKeeper manipulation when possible.
Set a reasonable ZooKeeper session timeout (e.g., 10-30 seconds) to balance fast failover against false positives.
When cloning VMs or containers, update macros before starting ClickHouse.
Monitor the system.replicas table for replicas that are not active or have a large replication lag.

Frequently Asked Questions

Q: Is it safe to delete the is_active node in ZooKeeper manually?
A: Only if you are absolutely sure the previous process is no longer running. Deleting the lock while the old instance is still alive can lead to split-brain replication issues.

Q: How long does it take for ZooKeeper to expire a dead session?
A: It depends on the negotiated session timeout, which is typically between 10 and 30 seconds. You can check the configured value in ClickHouse's ZooKeeper settings.

Q: Can this error occur with ClickHouse Keeper instead of ZooKeeper?
A: Yes. ClickHouse Keeper implements the same protocol, so ephemeral nodes and session semantics behave identically.

Q: Will data be lost because the replica could not start?
A: No data is lost. Once the conflict is resolved, the replica will catch up by fetching missing parts from other replicas in the cluster.