The "DB::Exception: Operation was aborted" error in ClickHouse signals that a running operation was terminated before completing. The ABORTED error code is raised when a query or background task is interrupted, either because a user explicitly cancelled it, the server began shutting down, or an internal timeout was reached.
Impact
Any in-progress query or mutation that encounters the ABORTED error will stop execution and produce no result. For SELECT queries, partial results are discarded. For INSERT or ALTER operations, data that had not yet been committed is rolled back, though parts already written to MergeTree may need cleanup. If the error occurs during a server shutdown, it is generally expected and does not indicate a problem.
Common Causes
- A user or application explicitly killed the query using
KILL QUERY - The ClickHouse server is shutting down gracefully and cancels all in-flight queries
- A query exceeded
max_execution_timeor another timeout setting, leading to cancellation - A client disconnected while the query was still running and
cancel_http_readonly_queries_on_client_closeis enabled - A distributed query was cancelled because one of the participating nodes failed or was unavailable
- Background merges or mutations were interrupted by a server restart
Troubleshooting and Resolution Steps
Check whether the query was explicitly killed by reviewing the query log:
SELECT query_id, exception_code, exception, event_time FROM system.query_log WHERE exception_code = 236 ORDER BY event_time DESC LIMIT 10;If the abort happened during a server shutdown, verify the server logs for shutdown messages:
grep -i 'shutdown\|terminating' /var/log/clickhouse-server/clickhouse-server.log | tail -20Review timeout settings if queries are being aborted unexpectedly:
SELECT name, value FROM system.settings WHERE name IN ('max_execution_time', 'receive_timeout', 'send_timeout', 'connect_timeout');Increase
max_execution_timeif legitimate long-running queries are being cut short:SET max_execution_time = 300;If clients are disconnecting and causing aborts, check the client-side timeout configuration. For HTTP clients, ensure the read timeout is long enough for the expected query duration.
For distributed queries that abort due to replica failures, check the health of all cluster nodes:
SELECT * FROM system.clusters WHERE cluster = 'your_cluster';If mutations were aborted, check their status and re-submit if needed:
SELECT database, table, mutation_id, is_done, latest_fail_reason FROM system.mutations WHERE is_done = 0;
Best Practices
- Set
max_execution_timeappropriately per user profile so that runaway queries are cancelled without affecting legitimate workloads. - Implement retry logic in your application for operations that may be aborted by transient issues like server restarts.
- Use query IDs so you can trace aborted queries back to the originating application or user.
- Schedule maintenance restarts during low-traffic windows to minimize the number of queries that are interrupted.
- Monitor the
system.query_logfor frequent ABORTED errors, which may indicate instability or misconfigured timeouts.
Frequently Asked Questions
Q: Is data lost when an INSERT is aborted?
A: ClickHouse uses atomic inserts for MergeTree tables. If the insert was not finalized, the data is not visible to readers. However, partially written temporary parts may remain on disk and are cleaned up automatically.
Q: Can I prevent queries from being cancelled during a graceful shutdown?
A: ClickHouse waits for a configurable period (controlled by shutdown_wait_unfinished in the server config) before forcibly terminating queries. You can increase this value to give long-running queries more time to finish.
Q: How do I distinguish between a user-initiated cancel and a system abort?
A: Check the system.query_log table. User-initiated kills will have a corresponding entry in system.query_log with type = 'QueryFinish' and an exception message mentioning KILL QUERY. System-level aborts typically reference shutdown or timeout in the exception text.