The "Too many open files" error in ClickHouse indicates that the server process has exhausted its available file descriptors. This system-level error (EMFILE) is raised when ClickHouse attempts to open a new file, socket, or pipe but has already reached the maximum number of open file descriptors allowed by the operating system.
Impact
When file descriptors are exhausted, ClickHouse cannot open new files for data parts, create new network connections, or perform any I/O operation requiring a new file descriptor. This can cause queries to fail, inserts to be rejected, replication to stall, and in severe cases, the server to become unresponsive. The issue affects all operations on the server, not just the query that triggered the error.
Common Causes
- The system file descriptor limit (
ulimit -n) is set too low for the workload - A large number of tables or data parts, each requiring open file descriptors for their files
- Many concurrent connections consuming file descriptors for sockets
- MergeTree tables with many small parts that have not been merged
- Excessive number of concurrent queries, each opening multiple data files
- File descriptor leak due to a bug (rare, but possible in older versions)
- System-wide file descriptor limit (
/proc/sys/fs/file-max) is too low
Troubleshooting and Resolution Steps
Check the current file descriptor limits for the ClickHouse process:
cat /proc/$(pidof clickhouse-server)/limits | grep 'open files'Check how many file descriptors ClickHouse is currently using:
ls /proc/$(pidof clickhouse-server)/fd | wc -lCheck the system-wide file descriptor limit:
cat /proc/sys/fs/file-maxIncrease the file descriptor limit for the ClickHouse user. Edit
/etc/security/limits.d/clickhouse.conf:clickhouse soft nofile 1048576 clickhouse hard nofile 1048576If using systemd, set the limit in the ClickHouse service file:
sudo systemctl edit clickhouse-serverAdd:
[Service] LimitNOFILE=1048576Then restart:
sudo systemctl daemon-reload sudo systemctl restart clickhouse-serverIncrease the system-wide limit if needed:
sudo sysctl -w fs.file-max=2097152 # Make permanent in /etc/sysctl.conf: echo 'fs.file-max = 2097152' | sudo tee -a /etc/sysctl.confReduce the number of open files by merging small parts:
-- Check tables with many parts SELECT database, table, count() AS part_count FROM system.parts WHERE active GROUP BY database, table ORDER BY part_count DESC LIMIT 20; -- Force merge to reduce part count OPTIMIZE TABLE your_database.your_table FINAL;Check for excessive connections:
SELECT count() FROM system.processes; SELECT metric, value FROM system.metrics WHERE metric LIKE '%Connection%';
Best Practices
- Set the file descriptor limit to at least 1,000,000 for production ClickHouse servers.
- Monitor the number of open file descriptors relative to the limit and alert when usage exceeds 80%.
- Regularly merge small parts by tuning MergeTree settings (
parts_to_delay_insert,parts_to_throw_insert) to prevent part accumulation. - Use connection pooling in client applications to limit the number of simultaneous connections.
- Drop unused tables and databases to reduce the baseline file descriptor usage.
- Monitor
system.metricsfor theOpenFileForReadandOpenFileForWritemetrics. - Set file descriptor limits in the systemd service file rather than relying on
/etc/security/limits.conf, as systemd may not read PAM limits.
Frequently Asked Questions
Q: What is a good file descriptor limit for ClickHouse?
A: For production servers, set it to at least 1,000,000 (1M). The default Linux limit of 1024 is far too low for ClickHouse. Large installations with many tables may need even higher limits. File descriptors are cheap; setting a high limit does not consume memory.
Q: How do I check the current usage without root access?
A: From within ClickHouse, you can check metrics:
SELECT metric, value FROM system.metrics WHERE metric LIKE '%File%' OR metric LIKE '%Connection%';
SELECT value FROM system.asynchronous_metrics WHERE metric = 'MaxPartCountForPartition';
Q: Will increasing the limit fix the problem permanently?
A: It addresses the immediate issue, but if your file descriptor usage is growing unboundedly (due to part accumulation or connection leaks), you should also address the root cause. Monitor usage over time to ensure it stabilizes after increasing the limit.