ClickHouse Suspiciously Many Broken Parts: Causes and Fixes

Q: Where do "detached" parts go?

/var/lib/clickhouse/data/ / /detached/ . They are listed in system.detached_parts for inspection.

Q: How do I delete detached parts safely?

Use ALTER TABLE ... DROP DETACHED PART 'part_name' SETTINGS allow_drop_detached = 1 , then SYSTEM SYNC REPLICA to repopulate from a peer.

Q: Will `force_restore_data` delete my data?

No, it moves broken parts to detached , not removes them. You delete them explicitly after deciding they cannot be recovered.

Q: What is the relationship between this and ZooKeeper?

For Replicated tables, there is also replicated_max_ratio_of_wrong_parts , which compares filesystem state against ZooKeeper. Different error, different setting.

ClickHouse refuses to start when a table has more broken parts than a safety threshold allows. The error reads DB::Exception: Suspiciously many broken parts to remove. This is intentional. ClickHouse would rather refuse to start than silently move many parts to detached and leave you wondering where your data went. This guide walks through why it happens and how to recover.

What "Broken" Means

A part is considered broken when ClickHouse cannot read its checksums, when the column files are missing or truncated, or when its metadata does not match what is on disk. During startup, the server scans every part directory and tags any inconsistent ones as broken.

If the number of broken parts exceeds max_suspicious_broken_parts (default 100) or max_suspicious_broken_parts_bytes, startup aborts.

Common Causes

Hard reboot before page cache flush. ClickHouse does not fsync by default. An insert is considered durable once it hits the Linux page cache. A power loss or hard reset before the OS flushes the cache loses data. This is the most common cause and produces many small broken parts at once.
Disk hardware issues. Bad sectors, RAID degradation, or filesystem corruption can scramble part files. Run dmesg, smartctl, fsck, and mdadm --detail on the underlying devices.
Manual filesystem operations. Deleting files from /var/lib/clickhouse/data/..., moving the data directory, or restoring a partial backup. Misconfigured shard or replica macros pointing the node at someone else's data directory has the same effect.

Option 1: Move Broken Parts to detached and Continue

If you accept that the broken parts are lost or worth examining offline, set the force_restore_data flag:

sudo -u clickhouse touch /var/lib/clickhouse/flags/force_restore_data
sudo systemctl start clickhouse-server

ClickHouse will start, move every broken part to /var/lib/clickhouse/data/<db>/<table>/detached/, and consume the flag (delete it) on success. After startup, inspect the moved parts:

SELECT database, table, reason, name
FROM system.detached_parts
WHERE reason LIKE '%broken%';

For replicated tables, the data is usually still available on healthy replicas. Drop the detached parts and let replication refill from peers:

ALTER TABLE db.table DROP DETACHED PART '202401_1_1_0' SETTINGS allow_drop_detached = 1;
SYSTEM SYNC REPLICA db.table;

For non-replicated tables, you can try ATTACH PART to put a detached part back in service. It only works if the part is actually readable and was misclassified.

Option 2: Raise the Threshold

If you trust the broken parts count and just want startup to proceed, increase the limits via merge_tree settings:

<clickhouse>
  <merge_tree>
    <max_suspicious_broken_parts>250</max_suspicious_broken_parts>
    <max_suspicious_broken_parts_bytes>10737418240</max_suspicious_broken_parts_bytes>
  </merge_tree>
</clickhouse>

Raising the threshold is appropriate when you understand the cause (for example, you just intentionally truncated a partition manually) and want to avoid the flag dance.

Option 3: Zero Tolerance

Setting both values to 0 disables automatic detaching entirely. ClickHouse will refuse to start if any part is broken, forcing manual review or restore from backup. Use this on critical clusters where silent data loss is unacceptable.

Prevent It From Happening Again

Set fsync_after_insert = 1 and fsync_part_directory = 1 on critical tables if you cannot guarantee clean shutdowns.
Use replicated tables so a single bad node does not lose data.
Monitor system.parts for parts that fail to merge, which can be an early sign of bad disk.
Take regular BACKUP TABLE snapshots so even worst-case recovery is bounded.

Common Pitfalls

Touching force_restore_data with the wrong owner. It must be owned by the clickhouse user or the server will not see it.
Increasing max_suspicious_broken_parts without investigating the root cause. The disk may keep producing more broken parts until it fails completely.
Re-attaching parts that are actually broken. They will be dropped again on the next merge attempt.
Confusing this error with the replicated_max_ratio_of_wrong_parts error, which is a separate ZooKeeper-vs-disk consistency check.

Frequently Asked Questions

Q: Is my data lost if I see this error? A: Usually no. The parts are still on disk and can be inspected. For replicated tables, other replicas often hold a clean copy.

Q: Where do "detached" parts go? A: /var/lib/clickhouse/data/<database>/<table>/detached/. They are listed in system.detached_parts for inspection.

Q: How do I delete detached parts safely? A: Use ALTER TABLE ... DROP DETACHED PART 'part_name' SETTINGS allow_drop_detached = 1, then SYSTEM SYNC REPLICA to repopulate from a peer.

Q: Will force_restore_data delete my data? A: No, it moves broken parts to detached, not removes them. You delete them explicitly after deciding they cannot be recovered.

Q: What is the relationship between this and ZooKeeper? A: For Replicated tables, there is also replicated_max_ratio_of_wrong_parts, which compares filesystem state against ZooKeeper. Different error, different setting.