NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Replica is already active

When ClickHouse raises the DB::Exception: Replica is already active error (code REPLICA_IS_ALREADY_ACTIVE), it means the server attempted to register a replica in ZooKeeper, but a node at that replica path already holds an active ephemeral lock. This typically happens after an unclean restart or when two ClickHouse instances are configured with the same replica identity.

Impact

This error blocks the affected replica from starting replication for the table. While it persists, the node will not pull new data parts, will not participate in quorum writes, and may fall behind the rest of the cluster. Queries routed to this replica could return stale results if the table was previously populated.

Common Causes

  1. Unclean server shutdown -- the previous ClickHouse process terminated without closing its ZooKeeper session, so the ephemeral node has not yet expired.
  2. Duplicate replica paths -- two distinct ClickHouse instances share the same ReplicatedMergeTree replica name and ZooKeeper path in their table definitions.
  3. ZooKeeper session timeout not elapsed -- after a crash, the old session is still considered alive because the session timeout (commonly 10-30 seconds) has not passed.
  4. Container or VM snapshot restored -- a cloned environment retains the original replica identity, causing a conflict with the still-running source node.

Troubleshooting and Resolution Steps

  1. Wait for the old session to expire If the previous ClickHouse process truly is gone, ZooKeeper will clean up the ephemeral node once the session timeout elapses. Simply retry after 30-60 seconds:

    SYSTEM RESTART REPLICA db.my_replicated_table;
    
  2. Verify no other process is using the same replica path Check for duplicate replica identifiers across your fleet:

    SELECT database, table, replica_name, replica_path
    FROM system.replicas
    WHERE active_replicas > 0;
    

    Ensure each ClickHouse instance has a unique {replica} macro in its configuration.

  3. Inspect the ZooKeeper node directly Use the ZooKeeper CLI or ClickHouse's system.zookeeper table to look at the replica's is_active node:

    SELECT name, value, ephemeralOwner
    FROM system.zookeeper
    WHERE path = '/clickhouse/tables/01/my_table/replicas/replica1';
    

    If ephemeralOwner is non-zero, an active session still holds the lock.

  4. Force-drop the stale lock (use with caution) If you are certain the old process is dead and cannot wait for the timeout:

    # Using zkCli
    deleteall /clickhouse/tables/01/my_table/replicas/replica1/is_active
    

    Then restart replication on the ClickHouse node.

  5. Review and fix macros configuration In /etc/clickhouse-server/config.d/macros.xml, confirm that {replica} resolves to a value unique to each server:

    <macros>
        <shard>01</shard>
        <replica>clickhouse-node-1</replica>
    </macros>
    

Best Practices

  • Always assign unique replica names via the {replica} macro, ideally derived from the hostname or a stable identifier.
  • Use SYSTEM RESTART REPLICA rather than manual ZooKeeper manipulation when possible.
  • Set a reasonable ZooKeeper session timeout (e.g., 10-30 seconds) to balance fast failover against false positives.
  • When cloning VMs or containers, update macros before starting ClickHouse.
  • Monitor the system.replicas table for replicas that are not active or have a large replication lag.

Frequently Asked Questions

Q: Is it safe to delete the is_active node in ZooKeeper manually?
A: Only if you are absolutely sure the previous process is no longer running. Deleting the lock while the old instance is still alive can lead to split-brain replication issues.

Q: How long does it take for ZooKeeper to expire a dead session?
A: It depends on the negotiated session timeout, which is typically between 10 and 30 seconds. You can check the configured value in ClickHouse's ZooKeeper settings.

Q: Can this error occur with ClickHouse Keeper instead of ZooKeeper?
A: Yes. ClickHouse Keeper implements the same protocol, so ephemeral nodes and session semantics behave identically.

Q: Will data be lost because the replica could not start?
A: No data is lost. Once the conflict is resolved, the replica will catch up by fetching missing parts from other replicas in the cluster.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.