NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Database replication failed

The DB::Exception: Database replication failed error (code DATABASE_REPLICATION_FAILED) indicates that the Replicated database engine was unable to synchronize a DDL operation (such as CREATE TABLE, ALTER TABLE, or DROP TABLE) across the cluster. Unlike table-level replication, this error relates to schema-level coordination managed by the Replicated database engine.

Impact

DDL operations on the affected database will fail or hang. Tables may end up with inconsistent schemas across replicas if a partial DDL was applied to some nodes but not others. This can lead to queries failing on nodes where the schema is out of sync, and it may block further DDL changes until the issue is resolved.

Common Causes

  1. ZooKeeper or ClickHouse Keeper unavailability -- the DDL log stored in ZooKeeper cannot be read or written.
  2. One or more nodes are unreachable -- the DDL cannot be applied on all replicas, causing the operation to fail.
  3. Conflicting DDL operations -- two nodes attempted incompatible schema changes simultaneously.
  4. Timeout during DDL execution -- a complex DDL operation took longer than the configured timeout on one of the replicas.
  5. Incompatible ClickHouse versions across nodes -- different versions may interpret DDL differently.
  6. Disk space or resource exhaustion on a replica preventing the DDL from executing.

Troubleshooting and Resolution Steps

  1. Check the DDL task queue The Replicated database uses a distributed DDL queue. Inspect it:

    SELECT * FROM system.distributed_ddl_queue
    WHERE database = 'my_replicated_db'
    ORDER BY entry_id DESC
    LIMIT 10;
    
  2. Review ZooKeeper for the database DDL log

    SELECT name, value
    FROM system.zookeeper
    WHERE path = '/clickhouse/databases/my_replicated_db/log';
    
  3. Check all nodes for schema consistency On each node, compare the table definitions:

    SHOW CREATE TABLE my_replicated_db.my_table;
    

    Look for differences in column definitions, engine settings, or indexes.

  4. Verify connectivity between all nodes and ZooKeeper

    SELECT * FROM system.zookeeper WHERE path = '/';
    

    Ensure this works on every node in the cluster.

  5. Retry the DDL operation If the failure was transient (e.g., a brief network blip), simply re-run the DDL statement:

    ALTER TABLE my_replicated_db.my_table ADD COLUMN new_col String;
    
  6. Manually apply the DDL on out-of-sync nodes If one node missed a DDL change, connect to it directly and apply the change manually. Then verify consistency.

  7. Recover from a stuck DDL queue If the queue is blocked by a failed entry:

    SYSTEM DROP DATABASE REPLICA 'stuck_replica' FROM DATABASE my_replicated_db;
    

    Use this only if the replica is permanently gone.

Best Practices

  • Run all DDL operations through a single entry point when possible to avoid conflicts.
  • Ensure all nodes in a Replicated database run the same ClickHouse version.
  • Monitor the distributed DDL queue for stuck entries.
  • Set appropriate timeouts for DDL operations using distributed_ddl_task_timeout.
  • Test schema changes in a staging environment before applying them to production.
  • Keep ZooKeeper or ClickHouse Keeper highly available with at least three nodes.

Frequently Asked Questions

Q: What is the Replicated database engine?
A: The Replicated database engine automatically replicates DDL statements (schema changes) across all nodes that share the same database. It complements table-level replication by keeping schemas consistent without manual intervention.

Q: Can I use the Replicated database engine without ReplicatedMergeTree tables?
A: Yes. The Replicated database engine handles DDL replication. The tables inside can use any engine, though ReplicatedMergeTree is the most common choice for data replication.

Q: What happens to queries during a failed DDL replication?
A: Existing queries against the current schema continue to work. Only queries that depend on the new schema (e.g., referencing a column that was supposed to be added) will fail on nodes where the DDL was not applied.

Q: How do I check if my database uses the Replicated engine?
A: Run SHOW CREATE DATABASE my_db. If the output includes ENGINE = Replicated(...), the database uses the Replicated engine.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.