The "DB::Exception: Cannot write to file descriptor" error in ClickHouse occurs when the server fails to write data to an open file descriptor. This is a low-level I/O error tied to the CANNOT_WRITE_TO_FILE_DESCRIPTOR error code, and it typically surfaces during inserts, merges, or other operations that produce data files on disk. The root cause almost always lies outside of ClickHouse itself -- in the storage layer, OS configuration, or filesystem state.
Impact
This error can have serious consequences for your ClickHouse deployment:
- Insert operations will fail, causing data ingestion pipelines to stall
- Background merges may be interrupted, leading to an accumulation of small parts
- Temporary files used for sorting or aggregation cannot be written, breaking complex queries
- If left unresolved, the server may become effectively read-only or enter a degraded state
Common Causes
- Disk is completely full or the filesystem has run out of inodes
- The ClickHouse process lacks write permissions on the data directory
- The filesystem has been remounted as read-only due to detected errors
- Underlying storage device failure (bad disk, disconnected network volume, degraded RAID)
- Disk quota exceeded for the user running the ClickHouse process
- File descriptor limit reached, causing newly opened descriptors to be invalid
- An external process or security tool (e.g., SELinux, AppArmor) blocking write access
Troubleshooting and Resolution Steps
Check available disk space and inodes:
df -h /var/lib/clickhouse df -i /var/lib/clickhouseIf the disk is full, free up space by removing old backups, detaching unused tables, or expanding the volume.
Verify filesystem mount mode:
mount | grep $(df /var/lib/clickhouse --output=source | tail -1)If the filesystem shows
ro(read-only), it likely remounted after an error. Checkdmesgfor the cause and remount as read-write after fixing the issue:sudo mount -o remount,rw /var/lib/clickhouseCheck file and directory permissions:
ls -la /var/lib/clickhouse/ stat /var/lib/clickhouse/data/Ensure the ClickHouse user (typically
clickhouse) owns the data directories:sudo chown -R clickhouse:clickhouse /var/lib/clickhouseInspect disk quotas:
quota -u clickhouse repquota /var/lib/clickhouseRemove or raise quotas if they are limiting writes.
Review security policies:
# For SELinux sudo ausearch -m avc -ts recent # For AppArmor sudo aa-statusAdjust policies to allow ClickHouse to write to its data paths.
Check for storage device errors:
dmesg | grep -i "error\|fail\|i/o" smartctl -a /dev/sdaReplace failing hardware or failover to healthy storage.
Restart ClickHouse after resolving the underlying problem. The server will resume normal write operations once the obstacle is removed.
Best Practices
- Configure disk space monitoring with alerts that fire well before the volume is full (e.g., at 80% capacity)
- Use separate volumes for ClickHouse data and system partitions so that OS logs or other processes cannot fill the data disk
- Set appropriate filesystem permissions during initial deployment and verify them after OS upgrades
- Maintain at least 10-15% free disk space to accommodate merges and temporary files
- Test storage failover procedures periodically in environments that use network-attached or cloud storage
- Avoid running ClickHouse under restrictive disk quotas in production
Frequently Asked Questions
Q: Can this error cause data corruption?
A: ClickHouse uses atomic writes and checksums, so a failed write operation generally does not corrupt existing data. However, the data being written at the time of failure will be lost and needs to be re-inserted.
Q: My disk shows free space, but I still get this error. What else could it be?
A: Check inode usage (df -i), filesystem mount mode, SELinux/AppArmor denials, and file descriptor limits. Any of these can block writes even when raw disk space is available.
Q: How do I prevent this from happening during large inserts?
A: Monitor disk space continuously, use ClickHouse's min_free_disk_space storage policy setting to prevent writes when space is critically low, and ensure your storage can handle the write throughput your workload requires.
Q: Does this affect replicated tables differently?
A: The error itself is the same, but replicated tables have the advantage of redundancy. Other replicas will continue serving reads and accepting writes. Once the affected node recovers, it can re-sync from its peers.