The "DB::Exception: Cannot close file" error in ClickHouse indicates that a call to close a file descriptor failed at the OS level. Represented by the CANNOT_CLOSE_FILE error code, this is a relatively rare condition that typically points to serious underlying problems with the storage subsystem or filesystem. While closing a file might seem like a trivial operation, a failure here can signal that data was not fully flushed to disk.
Impact
A file close failure can have the following effects:
- The data written to the file may not have been fully persisted, risking partial writes
- File descriptor leaks if the descriptor remains open after the failed close
- Ongoing merges or insert operations may be aborted
- Repeated occurrences can gradually exhaust the file descriptor pool, leading to cascading failures
Common Causes
- Underlying storage device errors (disk failure, NFS server disconnection, iSCSI timeout)
- Filesystem corruption that prevents proper metadata updates during close
- Kernel bugs or issues with specific filesystem drivers (especially with FUSE-based filesystems)
- Network filesystem (NFS, CIFS) losing connectivity during the close operation
- The file was already closed or the descriptor was invalidated by another thread or process
- Resource pressure causing the kernel to fail deferred write operations during close
Troubleshooting and Resolution Steps
Check kernel and system logs for storage errors:
dmesg | tail -50 journalctl -k --since "30 minutes ago" | grep -i "error\|fail"Look for I/O errors, device timeouts, or filesystem warnings.
Verify the storage device is healthy:
smartctl -a /dev/sda cat /sys/block/sda/device/stateFor network storage, confirm the mount is still active:
mount | grep /var/lib/clickhouse stat /var/lib/clickhouseCheck for file descriptor leaks:
ls /proc/$(pidof clickhouse-server)/fd | wc -l cat /proc/$(pidof clickhouse-server)/limits | grep "open files"A high count relative to the limit suggests a leak or excessive open files.
Review ClickHouse server logs:
grep "Cannot close" /var/log/clickhouse-server/clickhouse-server.err.logNote the file path and correlate it with the affected table or operation.
Check filesystem consistency: If you suspect corruption, schedule a filesystem check during downtime:
sudo umount /var/lib/clickhouse sudo fsck -f /dev/sdX sudo mount /var/lib/clickhouseRestart ClickHouse after addressing the storage issue. The server will reopen files as needed during startup.
Best Practices
- Use local, enterprise-grade storage (SSDs with power-loss protection) rather than network filesystems for ClickHouse data when possible
- Monitor storage device health with SMART monitoring and automated alerts
- If using NFS or other network filesystems, ensure stable network connectivity and configure appropriate mount timeouts
- Keep the operating system kernel up to date to benefit from filesystem driver fixes
- Set up file descriptor monitoring to detect leaks early
- Maintain redundant replicas so that a storage failure on one node does not cause data loss
Frequently Asked Questions
Q: Is data lost when this error occurs?
A: It depends on the operation. If the close failure happens after data was successfully written and fsynced, no data is lost. However, if deferred writes were pending during the close, some data may not have reached disk. ClickHouse checksums will detect any inconsistency on the next read.
Q: Can this error happen with cloud-managed disks (EBS, Persistent Disk)?
A: It is uncommon but possible, especially during cloud infrastructure incidents or if the disk becomes detached. Cloud provider status pages and instance logs can help confirm.
Q: Should I be concerned if this happens once?
A: A single occurrence could be a transient storage hiccup. However, you should still investigate the cause. Repeated occurrences indicate a persistent problem that needs attention before it escalates.
Q: Does this error affect all tables or just the one being operated on?
A: The error is specific to the file being closed at the time. Other tables are not directly affected unless the root cause (e.g., a failing disk) impacts all files on the same volume.