The "DB::Exception: Query is too slow" error in ClickHouse occurs when a query's execution time exceeds the configured max_execution_time limit. The TOO_SLOW error code is ClickHouse's way of terminating long-running queries before they consume excessive server resources over an extended period.
Impact
The query is killed mid-execution, and any partial results are discarded (unless timeout_overflow_mode is set to break). This can affect complex analytical queries, large data exports, and multi-stage operations that take longer than expected. In production environments, repeated timeouts may indicate underlying performance issues that affect the entire cluster.
Common Causes
max_execution_timeset too low for the complexity of the workload- Queries scanning large volumes of data without effective filtering or indexing
- Resource contention from concurrent queries, merges, or mutations slowing down execution
- Inefficient JOIN operations, especially with large right-hand tables
- Disk I/O bottlenecks causing slow data reads
- Network latency in distributed queries across multiple shards
- Suboptimal query plans due to missing statistics or outdated table metadata
Troubleshooting and Resolution Steps
Check the current execution time limit:
SELECT name, value FROM system.settings WHERE name = 'max_execution_time';Increase the limit for specific queries that legitimately need more time:
SET max_execution_time = 300; -- 5 minutes SELECT ...;Analyze query performance to find bottlenecks:
-- Check the query log for execution details SELECT query, query_duration_ms, read_rows, read_bytes, memory_usage FROM system.query_log WHERE type = 'QueryFinish' OR type = 'ExceptionWhileProcessing' ORDER BY event_time DESC LIMIT 20;Optimize the query:
- Add partition key columns to WHERE clauses
- Use PREWHERE for selective filters on large tables
- Reduce the number of columns in the SELECT
- Replace expensive JOINs with dictionary lookups where appropriate:
-- Instead of: SELECT t.*, d.name FROM transactions t JOIN dim_users d ON t.user_id = d.id; -- Use a dictionary: SELECT *, dictGet('dim_users_dict', 'name', user_id) FROM transactions;Check for resource contention:
-- Active queries SELECT query_id, elapsed, read_rows, memory_usage FROM system.processes; -- Active merges SELECT * FROM system.merges;Use
timeout_overflow_modeto get partial results instead of an error:SET max_execution_time = 30; SET timeout_overflow_mode = 'break'; SELECT ...;Consider query-level parallelism settings to speed up execution:
SET max_threads = 16; SELECT ...;
Best Practices
- Set
max_execution_timeat the user profile level, with different values for interactive users (short timeout) and batch jobs (longer timeout). - Monitor slow queries regularly using
system.query_logand set up alerts for queries approaching time limits. - Design tables with partition keys and primary keys that support your most common query patterns.
- Use materialized views to pre-compute expensive aggregations rather than running them on demand.
- Test query performance during development to catch slow queries before they reach production.
- Consider using
max_execution_time_leaffor distributed queries to set per-shard time limits.
Frequently Asked Questions
Q: What is the default value of max_execution_time?
A: The default is 0, meaning no time limit. If you encounter this error, the limit has been explicitly configured in your server settings, user profile, or session.
Q: Does max_execution_time include time spent waiting for locks or resources?
A: Yes, it measures wall-clock time from when the query starts executing. Time spent waiting for locks, network I/O, or disk reads all count toward the limit.
Q: Can I set different timeouts for different types of queries?
A: Not directly by query type, but you can assign different user profiles with different max_execution_time values. Interactive users might have a 30-second limit, while ETL service accounts could have a 10-minute limit.
Q: Will a timed-out query still consume resources after the error?
A: ClickHouse terminates the query and releases its resources promptly after the timeout. However, any mutations or DDL operations triggered before the timeout will continue independently.