The "DB::Exception: System error" in ClickHouse indicates that an OS-level system call has failed in a way that ClickHouse could not handle gracefully. The SYSTEM_ERROR code is a catch-all for low-level operating system failures that do not map to a more specific ClickHouse error. These failures can stem from filesystem issues, permission problems, resource exhaustion, or kernel-level faults.
Impact
A SYSTEM_ERROR can disrupt query execution, background merges, or even server startup depending on which system call failed. If the underlying OS issue is persistent -- such as a failing disk or exhausted file descriptors -- the error may recur across multiple operations and eventually render the ClickHouse instance unusable until the root cause is addressed.
Common Causes
- A filesystem has become read-only due to disk errors or corruption
- File descriptor limits have been exhausted at the OS level
- Permission denied for a file or directory that ClickHouse needs to access
- A network socket operation failed at the kernel level
- The underlying storage device is experiencing hardware failures or I/O errors
- SELinux or AppArmor policies are blocking a system call that ClickHouse requires
Troubleshooting and Resolution Steps
Check the ClickHouse error log for details:
grep -i "SYSTEM_ERROR\|System error" /var/log/clickhouse-server/clickhouse-server.err.log | tail -20The log entry usually contains the specific errno and system call name that failed.
Inspect the system journal for OS-level errors:
dmesg | tail -50 journalctl -u clickhouse-server --since "1 hour ago"Look for I/O errors, filesystem remount events, or out-of-memory messages.
Verify filesystem health:
mount | grep $(df /var/lib/clickhouse --output=source | tail -1)Confirm the filesystem is mounted read-write. If it has been remounted read-only, a disk error is likely.
Check file descriptor usage:
cat /proc/$(pidof clickhouse-server)/limits | grep "open files" ls /proc/$(pidof clickhouse-server)/fd | wc -lIf the current count is near the limit, increase it in
/etc/security/limits.confor the systemd unit file.Review security policies:
ausearch -m avc -ts recent getenforceIf SELinux is in enforcing mode, check whether it is denying ClickHouse access to required paths.
Examine disk health:
smartctl -a /dev/sda iostat -x 1 5Look for high I/O wait, disk errors, or SMART warnings that point to failing hardware.
Restart ClickHouse after resolving the underlying OS issue:
sudo systemctl restart clickhouse-server
Best Practices
- Set file descriptor limits to at least 100,000 for the ClickHouse process, as the default may be too low for large deployments.
- Monitor disk health and filesystem status proactively to catch problems before they cause SYSTEM_ERROR failures.
- Use a monitoring tool to track I/O errors and filesystem remount events on all ClickHouse data volumes.
- Keep the operating system and kernel up to date, as kernel bugs can sometimes cause spurious system call failures.
- If running in a container, ensure the container runtime grants ClickHouse the necessary system capabilities.
Frequently Asked Questions
Q: What specific system call caused the SYSTEM_ERROR?
A: Check the ClickHouse error log for the full exception message. It typically includes the errno value and the name of the failed system call (e.g., write, open, stat), which will point you to the root cause.
Q: Can a SYSTEM_ERROR cause data corruption?
A: It depends on which operation failed. If a write system call fails mid-operation, data parts may be left in an incomplete state. ClickHouse uses checksums and atomic operations to protect against corruption, but it is important to investigate and resolve the underlying issue promptly.
Q: I see SYSTEM_ERROR after upgrading the OS kernel. What should I do?
A: Kernel upgrades can occasionally change system call behavior or introduce regressions. Check the kernel changelog for relevant changes, verify that any required kernel modules are loaded, and consider rolling back if the issue is blocking.
Q: Is SYSTEM_ERROR the same as an internal ClickHouse bug?
A: Not usually. SYSTEM_ERROR reflects an OS-level failure rather than a bug in ClickHouse itself. However, if the error persists with no apparent OS issue, it may be worth filing a bug report with ClickHouse, including the full stack trace and error log.