ClickHouse DB::Exception: Cannot read from file descriptor

The "DB::Exception: Cannot read from file descriptor" error in ClickHouse signals a low-level failure when the server attempts to read data from an open file descriptor. This could be a data part file, a temporary file used during query processing, or even a pipe. The error code associated with this issue is CANNOT_READ_FROM_FILE_DESCRIPTOR, and it usually points to an underlying operating system or storage problem rather than a query logic mistake.

Impact

When this error occurs, you can expect:

Active queries that depend on the affected file to fail immediately
Potential inability to read specific table parts, making some data temporarily inaccessible
Degraded performance if ClickHouse retries operations or marks parts as broken
In replicated setups, the affected replica may fall behind while it attempts recovery

Common Causes

Corrupted or truncated data files on disk, often caused by an unclean shutdown or hardware failure
The file descriptor limit (ulimit -n) has been exhausted, preventing new file opens or causing existing descriptors to behave unexpectedly
Underlying storage device errors such as bad sectors, failed NVMe drives, or unresponsive network-attached storage
A filesystem mounted in read-only mode due to detected errors (e.g., ext4 remounting as read-only after journal failure)
File deleted or moved by an external process while ClickHouse still held it open
Kernel-level I/O errors reported via dmesg, especially on systems with degraded RAID arrays

Troubleshooting and Resolution Steps

Check system logs for I/O errors:
```
dmesg | grep -i "error\|fault\|i/o"
journalctl -u clickhouse-server --since "1 hour ago"
```
Look for any disk or filesystem errors that coincide with the ClickHouse exception.

Verify file descriptor limits:

cat /proc/$(pidof clickhouse-server)/limits | grep "open files"
ls /proc/$(pidof clickhouse-server)/fd | wc -l

If the current count is close to the limit, raise the limit in /etc/security/limits.conf or the systemd unit file:

[Service]
LimitNOFILE=500000

Check filesystem health:
```
df -h /var/lib/clickhouse
mount | grep $(df /var/lib/clickhouse --output=source | tail -1)
```
Confirm the filesystem is mounted read-write and has available space.

Inspect ClickHouse data integrity:

SELECT name, active, bytes_on_disk
FROM system.parts
WHERE database = 'your_db' AND table = 'your_table'
ORDER BY modification_time DESC;

Look for parts with anomalous sizes or broken parts logged in system.part_log.

Run a filesystem check if possible: Schedule a filesystem check during a maintenance window if you suspect corruption. For ext4:
```
sudo e2fsck -f /dev/sdX
```
Restart ClickHouse server after resolving the underlying issue. ClickHouse will attempt to recover and re-attach healthy parts on startup.

Best Practices

Set generous file descriptor limits (at least 100,000) for the ClickHouse process, especially in production environments
Use reliable storage with redundancy (RAID 10, replicated cloud volumes) to minimize the risk of read failures
Monitor disk health proactively using SMART tools and alerting on early warning signs
Enable ClickHouse's built-in data checksumming to detect corruption early
Keep backups current so you can restore parts that become unreadable
Avoid running external tools that modify or delete files in the ClickHouse data directory

Frequently Asked Questions

Q: Does this error always mean my disk is failing?
A: Not necessarily. While disk failure is one cause, exhausted file descriptors, filesystem remounts, or external processes interfering with data files can all trigger this error. Check system logs to narrow down the root cause.

Q: Can I recover data after seeing this error?
A: In most cases, yes. If the table uses ReplicatedMergeTree, the affected replica can re-fetch the corrupted part from another replica. For non-replicated tables, you may need to restore from a backup or detach the broken part.

Q: How do I find which file caused the error?
A: The ClickHouse error log typically includes the file path or descriptor number. Check /var/log/clickhouse-server/clickhouse-server.err.log for the full stack trace and file details.

Q: Will increasing file descriptor limits prevent this error?
A: It will prevent the subset of cases where descriptor exhaustion is the root cause. If the problem is disk corruption or hardware failure, raising limits alone won't help.