ClickHouse DB::Exception: Distributed broken batch files

Q: Can I use synchronous distributed inserts to avoid this issue entirely?

Yes, setting distributed_foreground_insert = 1 (previously insert_distributed_sync , kept as an alias) sends data directly to shards without writing intermediate batch files. This eliminates the risk of batch corruption but increases insert latency and makes the insert dependent on shard availability at the time of the write.

The "DB::Exception: Distributed broken batch files" error in ClickHouse indicates that batch data files stored locally for asynchronous distributed INSERT operations are corrupted or unreadable. The DISTRIBUTED_BROKEN_BATCH_FILES error code (error code 589) is raised when the background INSERT sender detects that the serialized .bin data files in the Distributed table's spool directory cannot be properly deserialized or transmitted to the target shard. When a batch cannot be processed, ClickHouse moves the offending files into a broken/ subdirectory under the per-shard directory so the rest of the queue can continue.

Impact

Corrupted batch files prevent the affected data from being delivered to the destination shards. The distributed INSERT pipeline will stall for the affected batches, and the data in those batches will not appear on the target shards. Other non-corrupted batches may continue to be processed normally. If left unresolved, corrupted files accumulate on disk and can consume significant storage space.

Common Causes

Disk corruption or I/O errors during the initial write of batch files to the local distributed directory
The ClickHouse server crashed or was killed while writing batch files, leaving them in a partial state
Out-of-disk-space conditions during batch file creation that resulted in truncated files
Manual modification or accidental deletion of files in the distributed table's data directory
Filesystem issues such as corrupted inodes or failed journaling
A ClickHouse version upgrade changed the batch file format, making old files unreadable by the new version

Troubleshooting and Resolution Steps

Identify the distributed table and the corrupted batch files from the error log:

grep -i 'DISTRIBUTED_BROKEN_BATCH_FILES\|Broken batch' /var/log/clickhouse-server/clickhouse-server.log | tail -20

Check the distributed directory for batch files:
```
ls -la /var/lib/clickhouse/data/my_db/my_distributed_table/
```
Each shard directory contains pending batch files.

Review the distributed send queue status:

SELECT database, table, is_blocked, error_count, data_files, broken_data_files
FROM system.distribution_queue;

ClickHouse automatically moves files it cannot process into a broken/ subdirectory inside the per-shard directory, leaving the rest of the queue to continue. Inspect and, if the data is unrecoverable elsewhere, remove them to reclaim disk space. Note that data in the broken files will be lost:
```
# Broken batches are quarantined under the broken/ subdirectory
ls -la /var/lib/clickhouse/data/my_db/my_distributed_table/shard1_replica1/broken/

# Remove broken files (data in these files will be lost)
rm /var/lib/clickhouse/data/my_db/my_distributed_table/shard1_replica1/broken/*.bin
```

Check disk health for I/O errors:

dmesg | grep -i 'error\|fault\|i/o' | tail -20

Verify disk space availability:
```
df -h /var/lib/clickhouse
```
After cleaning up, flush the distributed queue to resume normal operation:
```
SYSTEM FLUSH DISTRIBUTED my_db.my_distributed_table;
```

Re-insert the lost data if it is available from the source:

-- Re-run the original INSERT that produced the corrupted batch
INSERT INTO my_db.my_distributed_table SELECT * FROM source_data;

Best Practices

Monitor the system.distribution_queue table for broken_data_files to detect corruption early.
Ensure sufficient disk space on nodes that host distributed tables, as low-disk conditions are a common cause of batch corruption.
Use replicated underlying tables (ReplicatedMergeTree) so that data delivered to at least one replica is safe even if the distributed send queue has issues.
Implement source-side logging or checkpointing so that data can be re-inserted if batch files are lost.
Avoid manually modifying files in the ClickHouse data directory.
Consider using distributed_foreground_insert = 1 (formerly named insert_distributed_sync, which is still accepted as an alias) for critical data where you need immediate confirmation of delivery rather than relying on the async queue.
Tune the background sender via the distributed_background_insert_* settings (e.g. distributed_background_insert_batch, distributed_background_insert_sleep_time_ms). These were renamed from the older distributed_directory_monitor_* settings in version 23.10; the old names remain as aliases.

Frequently Asked Questions

Q: Will removing broken batch files cause data loss?
A: Yes, the data contained in the broken batch files will be lost. If the original data is available in the source system, you should re-insert it after cleaning up the corrupted files.

Q: How can I prevent batch file corruption?
A: Ensure stable disk I/O, maintain adequate free disk space, and use UPS or redundant storage to protect against power failures. Graceful server shutdowns (rather than kill -9) also reduce the risk of partial writes.

Q: Can I use synchronous distributed inserts to avoid this issue entirely?
A: Yes, setting distributed_foreground_insert = 1 (previously insert_distributed_sync, kept as an alias) sends data directly to shards without writing intermediate batch files. This eliminates the risk of batch corruption but increases insert latency and makes the insert dependent on shard availability at the time of the write.