The DB::Exception: No replica has part error (code NO_REPLICA_HAS_PART) appears when ClickHouse determines that a specific data part, recorded in ZooKeeper as needed, is not available on any replica in the cluster. Without at least one replica holding the part, the replication queue entry cannot be fulfilled.
Impact
When this error fires, the affected replica is unable to fetch the missing part and will stall on that replication queue entry. Subsequent merges or mutations that depend on the part will also be blocked. If left unresolved, the replication lag on the node grows, and queries directed to it may return incomplete results.
Common Causes
- All replicas lost the part simultaneously -- concurrent disk failures, aggressive
OPTIMIZEoperations, or manual deletion of part directories on every node. - Part was dropped before replication completed -- a
DROP PARTITIONorTRUNCATEran on the source replica before other replicas could fetch the data. - TTL or retention policies removed the part prematurely on all replicas.
- ZooKeeper metadata is stale -- the part entry exists in the replication log, but the actual data was removed during a previous recovery or cleanup.
- Network partitions prevented the part from being distributed before it was merged away on the source.
Troubleshooting and Resolution Steps
Identify the missing part Query the replication queue for stuck entries:
SELECT database, table, type, new_part_name, last_exception, num_tries FROM system.replication_queue WHERE last_exception LIKE '%No replica has part%' ORDER BY create_time;Check all replicas for the part On each replica, look in
system.parts:SELECT partition, name, active, rows FROM system.parts WHERE table = 'my_table' AND name = 'the_missing_part';Also check
system.detached_parts-- the part may have been detached rather than deleted.Attempt to restore from a detached part If the part exists in a detached state on any replica:
ALTER TABLE my_table ATTACH PART 'the_missing_part';Skip the unreachable entry When the data is truly irrecoverable and you accept the loss, clear the stuck entry from the replication queue:
SYSTEM RESTART REPLICA db.my_table;If that does not resolve it, you may need to use:
ALTER TABLE my_table DROP DETACHED PART 'the_missing_part' SETTINGS allow_drop_detached = 1;And then manually remove the log entry from ZooKeeper.
Restore from backup If you have a backup that contains the missing partition, restore it and attach the data:
clickhouse-backup restore --table my_table --partitions 'YYYYMM'Re-insert the data from source When the original data source is available, re-ingest the affected range to regenerate the part.
Best Practices
- Maintain at least three replicas for critical tables so that the loss of one node does not eliminate the last copy of recent parts.
- Avoid running
DROP PARTITIONorTRUNCATEconcurrently on multiple replicas. - Configure and test backups regularly, verifying that restoration actually works.
- Monitor
system.replication_queuefor entries with a highnum_triesvalue -- these often indicate parts that no replica can provide. - Be cautious with aggressive TTL rules; stagger expiration windows across partitions where possible.
Frequently Asked Questions
Q: Does this error mean I have permanently lost data?
A: Not necessarily. The part might exist in a detached state, in a backup, or at the original data source. Only if none of these options are available is the data truly lost.
Q: Can I just delete the queue entry in ZooKeeper to unblock replication?
A: Technically yes, but this means accepting the data loss for that part. Use SYSTEM RESTART REPLICA first, and only resort to manual ZooKeeper edits as a last resort.
Q: Will other tables on the same server be affected?
A: No. The replication queue is per-table, so other replicated tables will continue to function normally.
Q: How can I prevent this error in the future?
A: Ensure sufficient replica count, monitor replication health, and avoid operations that remove data from all replicas before it has been fully distributed.