NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Corrupted data detected

The DB::Exception: Corrupted data error (code CORRUPTED_DATA) is raised when ClickHouse detects that data on disk does not match its expected checksum or fails another integrity validation. ClickHouse performs checksum verification when reading parts, and any mismatch halts the operation to prevent serving incorrect data.

Impact

Queries that need to read from the corrupted part will fail. If the corruption is in an active, frequently accessed part, many queries may be affected. Merges involving the corrupted part will also fail, potentially causing a buildup of unmerged parts. The table itself remains operational for queries that do not touch the affected part.

Common Causes

  1. Disk hardware failure -- bad sectors, failing SSD, or unreliable storage controller silently flipping bits.
  2. Filesystem corruption -- an unclean shutdown or kernel bug damaged the data files.
  3. Memory errors (bit flips) -- faulty RAM can corrupt data during write operations, which ClickHouse then stores with incorrect contents.
  4. Network corruption during replication -- rare, but possible if the network stack does not provide end-to-end integrity.
  5. Incomplete write due to power loss or abrupt process termination.
  6. Third-party software modifying files in the ClickHouse data directory (antivirus, backup agents).

Troubleshooting and Resolution Steps

  1. Identify the corrupted part The error log will name the specific part and file that failed the checksum. You can also check:

    SELECT name, bytes_on_disk, modification_time
    FROM system.parts
    WHERE table = 'my_table' AND active;
    
  2. Run a checksum verification

    CHECK TABLE db.my_table;
    

    This scans all parts and reports any checksum failures.

  3. Detach the corrupted part Remove the bad part from the active set so queries can proceed:

    ALTER TABLE db.my_table DETACH PART 'corrupted_part_name';
    
  4. Fetch a healthy copy from another replica For replicated tables, ClickHouse will automatically fetch the missing part after detaching the corrupted one:

    SYSTEM RESTART REPLICA db.my_table;
    SYSTEM SYNC REPLICA db.my_table;
    
  5. Restore from backup if no replica has a good copy

    clickhouse-backup restore --table db.my_table --partitions 'affected_partition'
    
  6. Investigate the root cause Check hardware diagnostics:

    smartctl -a /dev/sda          # Disk health
    memtester 1G 1                 # Memory test
    dmesg | grep -i "error\|fault" # Kernel messages
    

    Also verify that no external processes are touching the data directory.

  7. Drop the corrupted detached part Once you have recovered the data through replication or backup:

    ALTER TABLE db.my_table DROP DETACHED PART 'corrupted_part_name' SETTINGS allow_drop_detached = 1;
    

Best Practices

  • Use ECC RAM on ClickHouse servers to catch and correct single-bit memory errors.
  • Deploy storage with checksumming (e.g., ZFS) for an additional layer of integrity protection.
  • Monitor SMART metrics on disks and replace drives proactively when warning signs appear.
  • Ensure that antivirus, backup agents, or other software excludes the ClickHouse data directory.
  • Run CHECK TABLE periodically as part of a maintenance routine to catch corruption early.
  • Maintain multiple replicas so that a healthy copy is always available for recovery.

Frequently Asked Questions

Q: Can ClickHouse repair corrupted data automatically?
A: ClickHouse does not repair corrupted data in place. However, for replicated tables it can fetch a clean copy of the affected part from another replica once the corrupted one is detached.

Q: Does corruption in one part affect other parts?
A: No. Each part is self-contained with its own checksums. Corruption in one part does not spread to others.

Q: Should I be worried about data corruption happening silently?
A: ClickHouse validates checksums on every read, so silent corruption that goes undetected for long periods is unlikely during normal operations. Parts that are rarely read could harbor corruption until accessed or checked.

Q: Is CHECK TABLE safe to run on a production system?
A: Yes, but it reads every part on disk and can be I/O intensive on large tables. Consider running it during off-peak hours or on one replica at a time.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.