ClickHouse DB::Exception: Cannot write to file

Q: Is this error the same as CANNOT_WRITE_TO_FILE_DESCRIPTOR?

They are related but distinct. CANNOT_WRITE_TO_FILE is a higher-level error tied to ClickHouse's file I/O abstraction, while CANNOT_WRITE_TO_FILE_DESCRIPTOR is a lower-level error from the raw write() system call. The troubleshooting steps overlap significantly.

The "DB::Exception: Cannot write to file" error in ClickHouse is a general-purpose write failure that surfaces when the server is unable to write data to a file on disk. The CANNOT_WRITE_TO_FILE error code encompasses a broad range of scenarios, from full disks to hardware failures. Since nearly every ClickHouse operation involves writing files at some point -- inserts write data parts, merges produce new parts, and queries write temporary results -- this error can appear in many different contexts.

Impact

A file write failure has wide-reaching consequences:

Active inserts are aborted, stopping data ingestion
Background merges cannot produce output parts, leading to part accumulation
Queries that spill to disk for sorting or joining will fail
Mutations cannot write their results, leaving the ALTER operation incomplete
If the issue is system-wide (e.g., full disk), all write operations across all tables are affected

Common Causes

Disk space exhaustion -- the most frequent cause
Inode exhaustion on the filesystem
The ClickHouse process lacks write permissions on the target file or directory
Filesystem remounted as read-only after detecting errors
Underlying storage device failure or disconnection
Disk quota exceeded for the ClickHouse user
File descriptor limit reached
SELinux or AppArmor denying write access

Troubleshooting and Resolution Steps

Check disk space first (most common cause):

df -h /var/lib/clickhouse
df -i /var/lib/clickhouse

If the disk is full, free up space immediately:

-- Drop old partitions
ALTER TABLE your_db.your_table DROP PARTITION 'old_partition';
-- Or truncate tables you no longer need
TRUNCATE TABLE your_db.temp_table;

Verify filesystem state:
```
mount | grep $(df /var/lib/clickhouse --output=source | tail -1)
```
Look for ro indicating read-only. Fix the underlying error and remount.

Check permissions:

sudo -u clickhouse touch /var/lib/clickhouse/write_test && rm /var/lib/clickhouse/write_test

Fix ownership if the test fails:

sudo chown -R clickhouse:clickhouse /var/lib/clickhouse

Look at system-level I/O errors:
```
dmesg | grep -i "error\|i/o\|scsi"
```

Review file descriptor usage:

cat /proc/$(pidof clickhouse-server)/limits | grep "open files"
ls /proc/$(pidof clickhouse-server)/fd | wc -l

Check for disk quotas:

repquota /var/lib/clickhouse 2>/dev/null
quota -u clickhouse 2>/dev/null

Inspect security policies:
```
sudo ausearch -m avc -ts recent
```
Once the root cause is resolved, restart ClickHouse to resume normal operations. Pending merges and mutations will be rescheduled automatically.

Best Practices

Set up disk space alerting at 75-80% capacity with a critical alert at 90%
Use ClickHouse's min_free_disk_space setting to halt writes before the disk is completely full
Separate the ClickHouse data volume from the OS volume to prevent system services from competing for space
Implement data retention policies using TTL to automatically remove old data
Monitor inode usage alongside disk space, especially for tables with many small parts
Run ClickHouse on ext4 or xfs with default mount options for reliable write behavior
Keep multiple replicas so that a write failure on one node does not block data ingestion entirely

Frequently Asked Questions

Q: My disk was full but I freed space. Do I need to restart ClickHouse?
A: Not always. ClickHouse will retry background operations like merges automatically. However, if queries or inserts continue to fail, a restart ensures a clean state. Check that inserts succeed before relying on automatic recovery alone.

Q: How can I prevent the disk from filling up in the first place?
A: Use TTL rules on tables to expire old data, implement partition-based retention, monitor disk usage proactively, and configure min_free_disk_space in ClickHouse storage policies to reserve space for merges.

Q: Is this error the same as CANNOT_WRITE_TO_FILE_DESCRIPTOR?
A: They are related but distinct. CANNOT_WRITE_TO_FILE is a higher-level error tied to ClickHouse's file I/O abstraction, while CANNOT_WRITE_TO_FILE_DESCRIPTOR is a lower-level error from the raw write() system call. The troubleshooting steps overlap significantly.

Q: Can network filesystems cause intermittent write failures?
A: Absolutely. NFS and other network filesystems are susceptible to temporary connectivity issues that manifest as write errors. For production ClickHouse deployments, local storage is strongly recommended.