The "DB::Exception: Too large compressed block size" error in ClickHouse occurs when a compressed data block exceeds the maximum allowed size. The TOO_LARGE_SIZE_COMPRESSED error code is a safety check that protects against reading corrupted data, processing malformed files, or encountering blocks that are too large to safely decompress into memory.
Impact
The operation reading the oversized compressed block fails immediately. This can affect SELECT queries reading from affected table parts, data imports from external compressed files, and replication if the corrupted part is being transferred between replicas. If the error occurs on table data, specific partitions or parts may become unreadable until the issue is resolved.
Common Causes
- Data corruption in stored table parts, causing the compressed block header to report an invalid size
- Importing compressed data files (e.g., from
file()orurl()table functions) that contain blocks exceeding ClickHouse's limits - Disk errors or filesystem corruption altering stored compressed data
- Incompatible compression formats or version mismatches when transferring data between systems
- Network corruption during replication causing damaged compressed blocks on replicas
- Extremely large values in a single column (e.g., very long strings) that result in oversized blocks before compression
Troubleshooting and Resolution Steps
Identify the affected table and part. The error message typically includes the table name and may reference the specific part:
SELECT name, database, table, active, rows, bytes_on_disk, modification_time FROM system.parts WHERE database = 'my_db' AND table = 'my_table' ORDER BY modification_time DESC;Verify data integrity of the affected table:
CHECK TABLE my_db.my_table;For replicated tables, try fetching a healthy copy of the part from another replica:
-- Detach the corrupted part ALTER TABLE my_db.my_table DETACH PART 'part_name'; -- ClickHouse will automatically fetch it from another replica -- Or force a fetch: SYSTEM RESTORE REPLICA my_db.my_table;For non-replicated tables, restore the affected partition from a backup:
-- Drop the corrupted partition ALTER TABLE my_db.my_table DROP PARTITION 'partition_id'; -- Restore from backup ALTER TABLE my_db.my_table ATTACH PARTITION 'partition_id' FROM my_db.my_table_backup;If the error occurs during data import, check the source file:
# Verify file integrity gzip -t compressed_file.gz # Or for lz4: lz4 -t compressed_file.lz4Check disk health for hardware-level issues:
# Check for filesystem errors dmesg | grep -i error smartctl -a /dev/sdaIf the issue is from very large values, consider setting a maximum string length or splitting large values:
-- Check for oversized values SELECT max(length(large_column)) FROM my_table;
Best Practices
- Use replicated tables to maintain redundant copies of data, enabling recovery from single-replica corruption.
- Implement regular backup procedures and test restore processes periodically.
- Monitor disk health with SMART tools and filesystem checks.
- Verify data file integrity before importing from external sources.
- Use checksums when transferring data between systems to detect corruption early.
- Set appropriate
max_compress_block_sizeduring table creation to control compressed block sizing.
Frequently Asked Questions
Q: Does this error always mean data corruption?
A: Not always, but it is a strong indicator. The error can also occur when importing data from external sources with incompatible formats. For data stored in ClickHouse tables, it most commonly points to corruption from disk errors, filesystem issues, or network problems during replication.
Q: Can I recover data from a corrupted part?
A: If the table is replicated, ClickHouse can fetch the part from a healthy replica. For non-replicated tables, you need a backup. If no backup exists, you may need to drop the affected partition and accept the data loss, or attempt manual recovery of the data files.
Q: How can I prevent compressed block corruption?
A: Use ECC memory, reliable storage with checksums (such as ZFS), and replicated tables. Enable ClickHouse's built-in checksums (they are on by default) and monitor disk health proactively.
Q: What is the maximum compressed block size in ClickHouse?
A: The default max_compress_block_size is 1,048,576 bytes (1 MB). The safety check for reading allows somewhat larger blocks, but blocks significantly exceeding expected sizes trigger the TOO_LARGE_SIZE_COMPRESSED error.