ClickHouse Graceful Shutdown: Node Decommission Checklist

Q: How long should I wait after `SYSTEM FLUSH DISTRIBUTED`?

Until system.distribution_queue shows no rows for the table. The flush is synchronous, but very large queues can take minutes.

Q: What if a query refuses to finish during shutdown?

The shutdown_wait_unfinished timeout (seconds) eventually wins. After that, the query is killed and the process exits. Tune the timeout so it covers your normal worst-case query.

A blunt systemctl stop clickhouse-server aborts in-flight queries, drops unflushed distributed inserts, and can leave replication queues in an awkward state. For planned maintenance or decommissioning, a graceful procedure prevents data loss and avoids query errors visible to clients. This is the checklist to follow.

1. Remove the Node from Cluster Configuration

Edit config.xml (or the relevant drop-in under config.d/) on the other cluster nodes to remove this node from the <remote_servers> block. Without this step, distributed queries will still try to reach the shutdown target.

Do not remove the last replica of any shard while the cluster has data on that shard. Non-random sharding keys will route writes to a missing destination and create gaps.

2. Take It Out of the Load Balancer

Remove the node from the upstream load balancer or service discovery so new client connections stop arriving. Until this happens, application traffic will keep hitting a node you are about to take down.

3. Detach Streaming and Materialized Sources

Streaming engines and materialized views can stall the shutdown by holding inserts open. Detach them so they stop consuming:

DETACH TABLE my_kafka_table;
DETACH TABLE my_rabbitmq_table;
DETACH TABLE my_buffer_table;
DETACH DATABASE my_materialized_database;

These can be reattached later or after migration.

4. Wait for In-Flight Queries

Check what is currently running:

SHOW PROCESSLIST;

Or the more detailed view:

SELECT query_id, user, elapsed, query
FROM system.processes
ORDER BY elapsed DESC;

Wait for queries to finish, or kill long-running ones with KILL QUERY WHERE query_id = '...' after checking with the owning team.

5. Flush Distributed Tables

A Distributed table buffers inserts locally before forwarding them to shards. Anything still buffered will be lost on shutdown unless flushed:

SELECT database, table, data_files, data_compressed_bytes
FROM system.distribution_queue
WHERE database NOT IN ('system');

For every queued table:

SYSTEM FLUSH DISTRIBUTED db.distributed_table;

Wait until the row disappears from system.distribution_queue.

6. Sync Replicas

On the surviving replicas of every shard this node participates in, run:

SYSTEM SYNC REPLICA db.table;

This forces the peers to catch up to the highest known replication queue position. After it returns, the surviving replicas hold everything the leaving node had.

7. Configure shutdown_wait_unfinished_queries

By default, SYSTEM SHUTDOWN and systemctl stop attempt to kill running queries rather than wait. To allow graceful query completion, enable the relevant setting in config.xml:

<clickhouse>
    <shutdown_wait_unfinished_queries>1</shutdown_wait_unfinished_queries>
    <shutdown_wait_unfinished>60</shutdown_wait_unfinished>
</clickhouse>

shutdown_wait_unfinished_queries switches behavior from "kill" to "wait". shutdown_wait_unfinished caps the wait in seconds so a stuck query cannot block shutdown forever.

8. Stop the Server

sudo systemctl stop clickhouse-server

Watch the log for clean shutdown messages. If the process does not exit within shutdown_wait_unfinished seconds, investigate before killing.

Verifying After Restart (for maintenance, not decommission)

If you are bringing the node back up after maintenance, confirm it caught up cleanly:

SELECT database, table, absolute_delay, queue_size, is_readonly
FROM system.replicas
ORDER BY absolute_delay DESC
LIMIT 10;

absolute_delay should drop toward zero. is_readonly = 1 means the node lost ZooKeeper connectivity; investigate before declaring the maintenance complete.

Common Pitfalls

Killing the process while the distribution queue is non-empty. Buffered inserts to other shards are lost.
Removing the node from remote_servers after stopping it. Other nodes spent the meantime retrying connections.
Forgetting to detach Kafka or Buffer tables. The shutdown stalls waiting for ingestion to drain.
Bringing the node back up before its remote_servers entry is restored. New writes do not include it.

Frequently Asked Questions

Q: Why does ClickHouse kill queries on shutdown by default? A: The default favors fast shutdown for crash recovery scenarios. For planned maintenance, enable shutdown_wait_unfinished_queries so clients see clean completion instead of connection errors.

Q: How long should I wait after SYSTEM FLUSH DISTRIBUTED? A: Until system.distribution_queue shows no rows for the table. The flush is synchronous, but very large queues can take minutes.

Q: Do I need to do this for replica failover scenarios? A: Failover is unplanned, so the full procedure does not apply. Make sure clients retry against healthy replicas and the data is on at least one other replica.

Q: What if a query refuses to finish during shutdown? A: The shutdown_wait_unfinished timeout (seconds) eventually wins. After that, the query is killed and the process exits. Tune the timeout so it covers your normal worst-case query.

Q: Should I disable inserts before shutdown? A: For decommissioning yes. Make the node read-only or block its port from clients once it is out of the load balancer.