NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Table is being restarted

The "DB::Exception: Table is being restarted" error occurs when you attempt to access a table that is in the middle of a SYSTEM RESTART REPLICA operation. The error code TABLE_IS_BEING_RESTARTED indicates a transient condition -- the table is temporarily unavailable while ClickHouse reinitializes its replication state. Once the restart completes, the table becomes accessible again.

Impact

All queries against the affected table will fail with this error for the duration of the restart. This includes SELECT, INSERT, and ALTER operations. The impact is typically brief (seconds to minutes), but it affects availability if the table serves real-time traffic.

Common Causes

  1. Explicit SYSTEM RESTART REPLICA command -- an administrator or script intentionally restarted the replica to fix replication issues.
  2. Automated recovery procedures -- monitoring systems that issue SYSTEM RESTART REPLICA when they detect replication lag or ZooKeeper session loss.
  3. ClickHouse internal recovery -- after a ZooKeeper session timeout, ClickHouse may automatically restart replicas.
  4. Concurrent access during maintenance -- queries arriving while a maintenance window operation is restarting replicas.

Troubleshooting and Resolution Steps

  1. Wait and retry. This is a transient error. The table will become available once the restart finishes. A retry with short backoff (1-2 seconds) is usually sufficient:

    -- Simply retry the query after a brief pause
    
  2. Check if a SYSTEM RESTART REPLICA is in progress:

    SELECT query, elapsed
    FROM system.processes
    WHERE query LIKE '%RESTART REPLICA%';
    
  3. Monitor the replication queue to see when the restart completes:

    SELECT
        database,
        table,
        is_currently_executing,
        num_tries,
        last_exception
    FROM system.replication_queue
    WHERE table = 'my_table'
    ORDER BY create_time DESC
    LIMIT 10;
    
  4. Check ZooKeeper connectivity. If the restart is taking unusually long, ZooKeeper may be slow or unreachable:

    SELECT * FROM system.zookeeper WHERE path = '/';
    
  5. Review the ClickHouse server logs for the restart reason:

    grep -i "restart replica" /var/log/clickhouse-server/clickhouse-server.log | tail -20
    
  6. If the restart seems stuck, check for ZooKeeper session issues and consider restarting the ClickHouse server as a last resort.

Best Practices

  • Schedule SYSTEM RESTART REPLICA operations during maintenance windows or low-traffic periods.
  • Implement retry logic with exponential backoff in applications that query replicated tables.
  • Use a load balancer that can route queries to healthy replicas while one is restarting.
  • Monitor replica health proactively so that manual restarts are rarely needed.
  • Set appropriate ZooKeeper session timeouts to balance between false restarts and quick failure detection.

Frequently Asked Questions

Q: How long does a replica restart typically take?
A: Usually a few seconds for healthy tables. Tables with large replication queues or slow ZooKeeper connections may take longer. If it exceeds a few minutes, investigate ZooKeeper health.

Q: Can I query other tables in the same database while one table is restarting?
A: Yes. The restart only affects the specific table being restarted. Other tables in the same database remain fully accessible.

Q: Should I automatically restart replicas when replication lag is detected?
A: Only as a last resort. Replication lag is usually caused by heavy load or resource contention, not a broken replica state. Restarting the replica will not help with resource issues and adds a brief outage. Investigate the root cause first.

Q: Is there an alternative to SYSTEM RESTART REPLICA for fixing replication issues?
A: For many replication problems, SYSTEM SYNC REPLICA my_table is a less disruptive option. It waits for the replication queue to be processed without taking the table offline. Only use RESTART REPLICA when the replica's replication state is genuinely corrupted.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.