The "DB::Exception: Cannot read from file descriptor" error in ClickHouse signals a low-level failure when the server attempts to read data from an open file descriptor. This could be a data part file, a temporary file used during query processing, or even a pipe. The error code associated with this issue is CANNOT_READ_FROM_FILE_DESCRIPTOR, and it usually points to an underlying operating system or storage problem rather than a query logic mistake.
Impact
When this error occurs, you can expect:
- Active queries that depend on the affected file to fail immediately
- Potential inability to read specific table parts, making some data temporarily inaccessible
- Degraded performance if ClickHouse retries operations or marks parts as broken
- In replicated setups, the affected replica may fall behind while it attempts recovery
Common Causes
- Corrupted or truncated data files on disk, often caused by an unclean shutdown or hardware failure
- The file descriptor limit (
ulimit -n) has been exhausted, preventing new file opens or causing existing descriptors to behave unexpectedly - Underlying storage device errors such as bad sectors, failed NVMe drives, or unresponsive network-attached storage
- A filesystem mounted in read-only mode due to detected errors (e.g., ext4 remounting as read-only after journal failure)
- File deleted or moved by an external process while ClickHouse still held it open
- Kernel-level I/O errors reported via dmesg, especially on systems with degraded RAID arrays
Troubleshooting and Resolution Steps
Check system logs for I/O errors:
dmesg | grep -i "error\|fault\|i/o" journalctl -u clickhouse-server --since "1 hour ago"Look for any disk or filesystem errors that coincide with the ClickHouse exception.
Verify file descriptor limits:
cat /proc/$(pidof clickhouse-server)/limits | grep "open files" ls /proc/$(pidof clickhouse-server)/fd | wc -lIf the current count is close to the limit, raise the limit in
/etc/security/limits.confor the systemd unit file:[Service] LimitNOFILE=500000Check filesystem health:
df -h /var/lib/clickhouse mount | grep $(df /var/lib/clickhouse --output=source | tail -1)Confirm the filesystem is mounted read-write and has available space.
Inspect ClickHouse data integrity:
SELECT name, active, bytes_on_disk FROM system.parts WHERE database = 'your_db' AND table = 'your_table' ORDER BY modification_time DESC;Look for parts with anomalous sizes or broken parts logged in
system.part_log.Run a filesystem check if possible: Schedule a filesystem check during a maintenance window if you suspect corruption. For ext4:
sudo e2fsck -f /dev/sdXRestart ClickHouse server after resolving the underlying issue. ClickHouse will attempt to recover and re-attach healthy parts on startup.
Best Practices
- Set generous file descriptor limits (at least 100,000) for the ClickHouse process, especially in production environments
- Use reliable storage with redundancy (RAID 10, replicated cloud volumes) to minimize the risk of read failures
- Monitor disk health proactively using SMART tools and alerting on early warning signs
- Enable ClickHouse's built-in data checksumming to detect corruption early
- Keep backups current so you can restore parts that become unreadable
- Avoid running external tools that modify or delete files in the ClickHouse data directory
Frequently Asked Questions
Q: Does this error always mean my disk is failing?
A: Not necessarily. While disk failure is one cause, exhausted file descriptors, filesystem remounts, or external processes interfering with data files can all trigger this error. Check system logs to narrow down the root cause.
Q: Can I recover data after seeing this error?
A: In most cases, yes. If the table uses ReplicatedMergeTree, the affected replica can re-fetch the corrupted part from another replica. For non-replicated tables, you may need to restore from a backup or detach the broken part.
Q: How do I find which file caused the error?
A: The ClickHouse error log typically includes the file path or descriptor number. Check /var/log/clickhouse-server/clickhouse-server.err.log for the full stack trace and file details.
Q: Will increasing file descriptor limits prevent this error?
A: It will prevent the subset of cases where descriptor exhaustion is the root cause. If the problem is disk corruption or hardware failure, raising limits alone won't help.