Recovering from Lost Parts After Upgrade or Replication Issues

Q: What is the difference between a "lost part" and a "detached part"?

A lost part is an inconsistency: it exists in ZooKeeper but not on disk (or vice versa) and was not put there intentionally. A detached part is one ClickHouse deliberately set aside in the detached/ directory — either automatically ( broken_ , unexpected_ ) or because you ran DETACH . Lost parts that ClickHouse can salvage often become detached parts; see detached parts cleanup .

Q: Will I lose data if I drop a partition with lost parts?

If the partition has zero active parts ( cnt_active = 0 ), the data is already gone everywhere — dropping the partition only clears the dead metadata. If active parts remain, dropping the partition deletes real data, so do not run the drop in that case. Always verify with the Step 3 query before acting.

Q: Can I just delete the part entry from ZooKeeper directly?

Avoid it. Manually editing ZooKeeper/Keeper nodes is the single most common cause of permanent cluster corruption. Use DROP PARTITION or SYSTEM RESTORE REPLICA and let ClickHouse maintain ZooKeeper consistently.

Q: Lost parts appeared right after an upgrade — is that a bug?

Often it is stricter validation in the new version flagging parts an older version tolerated, not new corruption. Let replication re-fetch from a healthy replica, or restore from backup. The classic ZK-but-not-disk lost-part bug was specific to ClickHouse 20.1 and fixed in 21.3.16 / 21.6.9 / 21.7.6 / 21.8; on modern 24.x/25.x releases this manifests after crashes or operator error rather than that bug.

Q: How do I prevent lost parts in the first place?

Run ReplicatedMergeTree with at least two healthy replicas, keep ZooKeeper/Keeper healthy (see the ZooKeeper configuration guide ), maintain a backup schedule , and canary upgrades on one replica before rolling them out.

Q: The replication queue is stuck but no replica is marked lost — what now?

Inspect system.replication_queue for the failing entry type and last_exception . A GET_PART with No active replica has part is a lost part; other stalls may be merges or mutations. Start with the replication queue guide and replication problems diagnosis .

A "lost part" in ClickHouse is a data part that exists in one place but not the other: registered in ZooKeeper/Keeper as something a replica should have, but physically absent on disk — or present on disk but unknown to the cluster. This mismatch typically surfaces after a version upgrade, a hard crash, or a replication hiccup, and it shows up as GET_PART entries stuck in the replication queue, No active replica has part errors, or a replica that refuses to converge.

This guide explains how to tell the two cases apart, when it is safe to remove a lost part, and the exact repair procedure. It is distinct from total data loss recovery, which rebuilds an entire replica, and from cleaning up intentionally detached parts, which deals with parts you moved aside on purpose.

What "Lost Part" Actually Means

There are two opposite failure modes, and they call for different fixes. Diagnose which one you have before touching anything.

Symptom	Where the part exists	Where it is missing	Typical cause	Fix direction
`GET_PART` stuck in queue, `No active replica has part`	ZooKeeper/Keeper metadata	Every replica's disk	Replica was offline while the part was merged away and cleaned from all peers; historic bug (CH 20.1, fixed in 21.3.16 / 21.6.9 / 21.7.6 / 21.8)	Remove the orphaned ZK entry / drop the empty partition
`Suspiciously many broken parts`, `unexpected_`/`ignored_` in `detached/`	A replica's disk	ZooKeeper/Keeper metadata	Hard crash before flush, post-upgrade format checks, manual file moves	Let ClickHouse re-fetch, or detach/drop the on-disk part

The first case — part in ZooKeeper but gone from disk everywhere — is the classic "lost part" and the focus of most of this guide. The data is genuinely gone; the goal is to clear the stale metadata so replication stops looping. The second case is usually self-healing: ClickHouse moves the suspect part to detached/ and re-fetches a good copy from a healthy replica.

Step 1: Identify the Lost Parts

When a part is recorded in ZooKeeper but no replica can supply it, you will see endless GET_PART tasks in the replication queue:

SELECT database, table, type, new_part_name, last_exception, num_tries
FROM system.replication_queue
WHERE type = 'GET_PART'
ORDER BY num_tries DESC;

A high num_tries with a last_exception like No active replica has part X or covering part is the tell-tale sign of a lost part.

To confirm the part is registered in ZooKeeper but absent from disk on a replica, compare system.zookeeper against system.parts (adapted from the Altinity Knowledge Base parts-consistency check):

SELECT zoo.p_path AS part_zoo, zoo.ctime, zoo.mtime, disk.p_path AS part_disk
FROM
(
    SELECT concat(path, '/', name) AS p_path, ctime, mtime
    FROM system.zookeeper
    WHERE path IN (SELECT concat(replica_path, '/parts') FROM system.replicas)
) AS zoo
LEFT JOIN
(
    SELECT concat(replica_path, '/parts/', name) AS p_path
    FROM system.parts
    INNER JOIN system.replicas USING (database, table)
) AS disk ON zoo.p_path = disk.p_path
WHERE part_disk = ''
  AND zoo.mtime <= now() - INTERVAL 1 HOUR
ORDER BY part_zoo;

Rows returned are parts ClickHouse believes a replica should have, but which are not on its disk. The mtime <= now() - INTERVAL 1 HOUR filter avoids flagging parts that are simply mid-fetch — only entries that have been stale for a while are real candidates.

Reading system.zookeeper requires you to either filter on a specific path or, as above, supply the paths via a subquery. Querying the whole tree without a path predicate is rejected.

Step 2: Decide Whether It Is Safe to Remove

Removing a lost part is destroying metadata about data that, in the orphaned-ZK case, no longer exists anyway. Before you act, rule out the recoverable scenarios:

A healthy replica still has the part. If any replica holds it, this is not a lost part — it is a sync lag. Run SYSTEM SYNC REPLICA db.table on the lagging node and let it fetch. See the replication queue guide for diagnosing stalls.
The part is in detached/ as unexpected_ or broken_. The data may still be intact on disk. Investigate before dropping — see Suspiciously many broken parts.
A backup covers the affected partition. If the data matters and you have a backup, restoring the partition is preferable to dropping it.

Only proceed to removal once you have confirmed the data is genuinely unrecoverable (no replica, no backup, nothing in detached/) and the queue entry is permanently stuck.

Step 3: Remove Orphaned Parts Stuck in the Queue

If a whole partition ended up with no active parts because its GET_PART tasks can never complete, the cleanest fix is to drop the empty partition, which also clears the related queue entries. This query generates the ALTER TABLE ... DROP PARTITION statements for partitions that have stuck GET_PART tasks and zero active parts (adapted from the Altinity Knowledge Base):

SELECT 'ALTER TABLE ' || database || '.' || table ||
       ' DROP PARTITION ID ''' || partition_id || ''';' AS stmt
FROM
(
    SELECT database, table, splitByChar('_', new_part_name)[1] AS partition_id
    FROM system.replication_queue
    WHERE type = 'GET_PART'
      AND NOT is_currently_executing
      AND create_time < toStartOfDay(yesterday())
    GROUP BY database, table, partition_id
) AS q
LEFT JOIN
(
    SELECT database, table, partition_id, countIf(active) AS cnt_active
    FROM system.parts
    GROUP BY database, table, partition_id
) AS p USING (database, table, partition_id)
WHERE cnt_active = 0;

Review the generated statements, then run the ones you have confirmed are safe. DROP PARTITION is replicated and removes both the data (none, in this case) and the orphaned ZooKeeper entries for that partition.

A dedicated page walks through this exact pattern in more detail: removing empty partitions from the replication queue.

If the lost part does not leave the partition empty (other active parts cover the same rows), do not drop the partition. Instead, treat it as a per-replica inconsistency and use SYSTEM RESTORE REPLICA (next section) to rebuild that replica's part set from ZooKeeper.

Step 4: Repair a Replica With SYSTEM RESTORE REPLICA

When a replica's local part set has drifted from ZooKeeper — for example after metadata was lost or the replica was marked read-only — SYSTEM RESTORE REPLICA re-initializes it from the shared metadata, detaching unexpected local parts and re-fetching what it should have:

SYSTEM RESTART REPLICA db.table;     -- re-read state from ZooKeeper first
SYSTEM RESTORE REPLICA db.table;     -- rebuild the replica's part set

SYSTEM RESTORE REPLICA requires the table to be in read-only mode (which a damaged replica usually already is). Available since ClickHouse 21.7, it is the targeted alternative to the full metadata-tar rebuild covered in the disaster recovery guide. For a fleet of tables on a node that lost its ZooKeeper data, the force_restore_data flag approach in that guide scales better.

After restoring, force the replica to pull and apply the queue:

SYSTEM SYNC REPLICA db.table STRICT;

STRICT waits until the replication queue is fully drained, so a successful return means the replica has caught up. Note that STRICT may never succeed if new entries constantly appear in the replication queue — use it only when you can ensure no concurrent writes are adding new tasks.

Post-Upgrade Lost Parts

Upgrades introduce two specific flavors of this problem:

Stricter startup checks. Newer versions sometimes validate part files more aggressively. A part that an older version tolerated may be flagged on the first start after upgrade and moved to detached/ with a broken_ prefix. The data is usually still fetchable from a healthy replica — let replication re-pull it, or restore from backup.
Suspiciously many broken parts on startup. If many parts fail validation at once (common after an unclean shutdown during an upgrade), ClickHouse refuses to start the table as a circuit breaker. This is governed by max_suspicious_broken_parts (default 100) and max_suspicious_broken_parts_bytes (default 1073741824, i.e. 1 GiB). Do not blindly raise these — first confirm the parts are recoverable elsewhere. Full procedure: Suspiciously many broken parts.

Always read the release notes for the versions you are jumping across, and upgrade one minor version at a time on a canary replica before rolling out fleet-wide.

Inspecting and Removing Detached Parts

When ClickHouse sets a part aside rather than deleting it, it lands in detached/ with a reason prefix. List them by reason:

SELECT database, table, reason, count() AS parts
FROM system.detached_parts
GROUP BY database, table, reason
ORDER BY database, table, reason;

Common reasons and how to treat them:

broken_ / broken-on-start — corrupted on disk; investigate, then drop once you confirm a good copy exists elsewhere.
unexpected_ — on disk but not expected per ZooKeeper; usually a duplicate that a healthy replica already has.
ignored_ — on disk but not in ZooKeeper at startup; typically safe to drop after validation.
covered-by-broken — a healthy part shadowed by a broken one; often re-attachable.

Once you are certain a detached part is not needed, remove it:

ALTER TABLE db.table DROP DETACHED PART 'unexpected_all_1_1_0' SETTINGS allow_drop_detached = 1;

allow_drop_detached = 1 is required to run DROP DETACHED. To reattach a part you have confirmed is good instead, use ALTER TABLE db.table ATTACH PART 'all_1_1_0';. For the full detached-part lifecycle, see Cleaning up detached parts.

Mutation Queue Interaction

Lost parts and stuck mutations feed each other. A mutation (ALTER TABLE ... UPDATE/DELETE) creates a new part version; if the source part it must mutate is lost, the mutation task cannot complete and the mutation queue backs up. Check both queues together:

SELECT database, table, mutation_id, parts_to_do, is_done, latest_fail_reason
FROM system.mutations
WHERE NOT is_done
ORDER BY parts_to_do DESC;

If latest_fail_reason references a part you have identified as lost, resolving the lost part (dropping the empty partition or restoring the replica) usually unblocks the mutation. If a mutation is itself wedged, see too many mutations and the mutations performance guide. Never KILL MUTATION and DROP PARTITION in the wrong order — clear the lost part first, then let the mutation finish or kill it cleanly.

Best Practices

Diagnose direction first. ZK-but-not-disk and disk-but-not-ZK have opposite fixes. The Step 1 consistency query tells you which you have.
Exhaust recovery before removal. Check every replica, detached/, and backups before dropping anything. Removal is irreversible.
Prefer DROP PARTITION over manual ZooKeeper edits. Hand-editing ZooKeeper nodes to remove part entries is the most common way to corrupt a cluster permanently. Let ClickHouse manage ZooKeeper.
Use the mtime/age filters. Only treat parts that have been inconsistent for an hour or more as lost; younger mismatches are usually in-flight fetches.
Keep replicas and backups. A lost part is only unrecoverable when there is no healthy replica and no backup. Run ReplicatedMergeTree plus a backup schedule for every important table.
Canary your upgrades. Most post-upgrade part issues are caught on a single replica before they reach the whole fleet.

Common Issues

GET_PART retries forever. The part is lost from all replicas. Drop the empty partition (Step 3) or, if the partition is not empty, restore the replica (Step 4).
Dropping the partition deletes live data. Only run the generated DROP PARTITION statements where cnt_active = 0. If active parts exist, do not drop.
SYSTEM RESTORE REPLICA fails with "Replica is not read-only." Restore only works on a read-only replica. If it is read-write, the data set is intact — use SYSTEM SYNC REPLICA instead.
Removing the part did not clear the queue. Run SYSTEM SYNC REPLICA db.table PULL to refresh the queue from ZooKeeper, then re-check system.replication_queue.
Repeated broken parts after every restart. This points at failing hardware or a filesystem issue, not a one-off upgrade artifact. Investigate the disk before raising max_suspicious_broken_parts.

How Pulse Helps

Lost parts are easy to misdiagnose under pressure — it is genuinely hard, mid-incident, to tell an orphaned ZooKeeper entry from a replica that is simply a few minutes behind. Pulse (pulse.support) continuously watches system.replication_queue, system.replicas, system.detached_parts, and the mutation queue across every node, and surfaces stuck GET_PART loops, growing detached-part counts, and post-upgrade part inconsistencies before they stall ingestion. When a part really is lost, Pulse points you at the specific table, partition, and the safe next step, so you spend your time recovering rather than reconstructing what state the cluster is in. It is built and run by ClickHouse and search infrastructure engineers who handle these incidents day to day.

Frequently Asked Questions

Q: What is the difference between a "lost part" and a "detached part"? A lost part is an inconsistency: it exists in ZooKeeper but not on disk (or vice versa) and was not put there intentionally. A detached part is one ClickHouse deliberately set aside in the detached/ directory — either automatically (broken_, unexpected_) or because you ran DETACH. Lost parts that ClickHouse can salvage often become detached parts; see detached parts cleanup.

Q: Will I lose data if I drop a partition with lost parts? If the partition has zero active parts (cnt_active = 0), the data is already gone everywhere — dropping the partition only clears the dead metadata. If active parts remain, dropping the partition deletes real data, so do not run the drop in that case. Always verify with the Step 3 query before acting.

Q: Can I just delete the part entry from ZooKeeper directly? Avoid it. Manually editing ZooKeeper/Keeper nodes is the single most common cause of permanent cluster corruption. Use DROP PARTITION or SYSTEM RESTORE REPLICA and let ClickHouse maintain ZooKeeper consistently.

Q: Lost parts appeared right after an upgrade — is that a bug? Often it is stricter validation in the new version flagging parts an older version tolerated, not new corruption. Let replication re-fetch from a healthy replica, or restore from backup. The classic ZK-but-not-disk lost-part bug was specific to ClickHouse 20.1 and fixed in 21.3.16 / 21.6.9 / 21.7.6 / 21.8; on modern 24.x/25.x releases this manifests after crashes or operator error rather than that bug.

Q: How do I prevent lost parts in the first place? Run ReplicatedMergeTree with at least two healthy replicas, keep ZooKeeper/Keeper healthy (see the ZooKeeper configuration guide), maintain a backup schedule, and canary upgrades on one replica before rolling them out.

Q: The replication queue is stuck but no replica is marked lost — what now? Inspect system.replication_queue for the failing entry type and last_exception. A GET_PART with No active replica has part is a lost part; other stalls may be merges or mutations. Start with the replication queue guide and replication problems diagnosis.