When ClickHouse fails to fetch a part on a replica, the corresponding task stays in system.replication_queue and retries forever. If the partition behind that task is empty everywhere in the cluster, there is nothing useful to fetch, but the queue keeps growing and pollutes monitoring. The recipe below cross-references system.replication_queue against active parts on every replica, then prints the exact ALTER TABLE ... DROP PARTITION commands needed to clear the stuck entries.
Run this as a periodic operational check on ReplicatedMergeTree clusters that have experienced ZooKeeper outages, replica resyncs, or aborted mutations.
Generate the DROP PARTITION Statements
Replace {cluster} with your cluster name from system.clusters before executing. The output is a list of DDL statements you can review and run.
SELECT 'ALTER TABLE ' || database || '.' || table || ' DROP PARTITION ID \''|| partition_id || '\';'
FROM
(
SELECT DISTINCT database, table, extract(new_part_name, '^[^_]+') AS partition_id
FROM clusterAllReplicas('{cluster}', system.replication_queue)
) AS rq
LEFT JOIN
(
SELECT database, table, partition_id, sum(rows) AS rows_count, count() AS part_count
FROM clusterAllReplicas('{cluster}', system.parts)
WHERE active
GROUP BY database, table, partition_id
) AS p
USING (database, table, partition_id)
WHERE p.rows_count = 0 AND p.part_count = 0
FORMAT TSVRaw;
How the Query Works
The query runs in two stages and joins the results:
- The first subquery scans
system.replication_queueon every replica viaclusterAllReplicas. It extracts the partition ID fromnew_part_nameusing the regex^[^_]+, which captures everything before the first underscore (the partition prefix of the part name). - The second subquery aggregates
system.partscluster-wide, summingrowsand counting parts per partition. Onlyactiveparts are considered. - The
LEFT JOINkeeps every queue entry, then theWHEREclause filters down to partitions where bothrows_countandpart_countare zero, meaning no replica holds data for that partition. FORMAT TSVRawprints the generated SQL without quoting, so you can pipe it directly intoclickhouse-client.
Reviewing and Running the Output
| Step | Action |
|---|---|
| 1 | Save the output to a file: clickhouse-client -q "..." > drops.sql |
| 2 | Open the file and confirm each table and partition ID is expected |
| 3 | Run statements one at a time, or in batches, against a single replica |
| 4 | Re-check system.replication_queue to confirm entries cleared |
The DROP PARTITION command is replicated automatically through ZooKeeper, so you only execute it on one node. After replication catches up, the matching queue tasks disappear on every replica.
Verifying the Queue is Clean
After dropping the empty partitions, re-run a short diagnostic to confirm the queue is shrinking and no new errors appeared:
SELECT database, table, count() AS pending_tasks, max(num_tries) AS max_retries
FROM clusterAllReplicas('{cluster}', system.replication_queue)
GROUP BY database, table
ORDER BY pending_tasks DESC;
A healthy ReplicatedMergeTree cluster should report low pending_tasks and max_retries close to zero. Sustained growth indicates a different problem, such as a missing source part or a ZooKeeper session issue.
Common Pitfalls
- Running
DROP PARTITIONon a partition that is empty on most replicas but still holds rows on one is destructive. The cluster-wide aggregation in the query protects against this, but only if every replica is reachable when the SELECT runs. - Using
extract(new_part_name, '^[^_]+')assumes the standard part naming convention. Tables created with custom partition keys that include underscores may need a different regex. - The cluster name in
{cluster}is case-sensitive and must matchsystem.clusters.clusterexactly. - If you have multiple ClickHouse clusters defined, point the query at the cluster that owns the affected tables.
Frequently Asked Questions
Q: Why are these partitions empty but still queued?
A: Common causes are a DROP PARTITION that ran on the source replica before the target finished fetching, an aborted mutation that left a queue entry pointing to a part that was never produced, or a ZooKeeper rollback that referenced a part no replica retained.
Q: Will dropping a partition that does not exist cause an error?
A: ClickHouse treats DROP PARTITION for a non-existent partition as a no-op on most modern versions. The replicated DDL still flushes the related queue entries.
Q: Can I automate this in a cron job? A: Yes, but keep a manual review step. Pipe the SELECT output to a file, sanity-check it, then apply. Automating both the generation and execution risks dropping partitions that briefly appear empty during a fetch or merge window.
Q: Should I use SYSTEM DROP REPLICA instead?
A: No. SYSTEM DROP REPLICA removes an entire replica from ZooKeeper. Use it when a replica server is permanently gone, not when individual partitions are empty.
Q: What if new_part_name is empty in the queue row?
A: Some task types (such as MUTATE_PART) populate different columns. Filter by type if you only want GET_PART entries: add WHERE type = 'GET_PART' to the inner SELECT.