ClickHouse DB::Exception: Query is too slow

The "DB::Exception: Query is too slow" error in ClickHouse occurs when a query's execution time exceeds the configured max_execution_time limit. The TOO_SLOW error code is ClickHouse's way of terminating long-running queries before they consume excessive server resources over an extended period.

Impact

The query is killed mid-execution, and any partial results are discarded (unless timeout_overflow_mode is set to break). This can affect complex analytical queries, large data exports, and multi-stage operations that take longer than expected. In production environments, repeated timeouts may indicate underlying performance issues that affect the entire cluster.

Common Causes

max_execution_time set too low for the complexity of the workload
Queries scanning large volumes of data without effective filtering or indexing
Resource contention from concurrent queries, merges, or mutations slowing down execution
Inefficient JOIN operations, especially with large right-hand tables
Disk I/O bottlenecks causing slow data reads
Network latency in distributed queries across multiple shards
Suboptimal query plans due to missing statistics or outdated table metadata

Troubleshooting and Resolution Steps

Check the current execution time limit:

SELECT name, value FROM system.settings WHERE name = 'max_execution_time';

Increase the limit for specific queries that legitimately need more time:
```
SET max_execution_time = 300;  -- 5 minutes
SELECT ...;
```

Analyze query performance to find bottlenecks:

-- Check the query log for execution details
SELECT query, query_duration_ms, read_rows, read_bytes, memory_usage
FROM system.query_log
WHERE type = 'QueryFinish' OR type = 'ExceptionWhileProcessing'
ORDER BY event_time DESC
LIMIT 20;

Optimize the query:
- Add partition key columns to WHERE clauses
- Use PREWHERE for selective filters on large tables
- Reduce the number of columns in the SELECT
- Replace expensive JOINs with dictionary lookups where appropriate:
```
-- Instead of:
SELECT t.*, d.name FROM transactions t JOIN dim_users d ON t.user_id = d.id;

-- Use a dictionary:
SELECT *, dictGet('dim_users_dict', 'name', user_id) FROM transactions;
```

Check for resource contention:

-- Active queries
SELECT query_id, elapsed, read_rows, memory_usage FROM system.processes;

-- Active merges
SELECT * FROM system.merges;

Use timeout_overflow_mode to get partial results instead of an error:

SET max_execution_time = 30;
SET timeout_overflow_mode = 'break';
SELECT ...;

Consider query-level parallelism settings to speed up execution:
```
SET max_threads = 16;
SELECT ...;
```

Best Practices

Set max_execution_time at the user profile level, with different values for interactive users (short timeout) and batch jobs (longer timeout).
Monitor slow queries regularly using system.query_log and set up alerts for queries approaching time limits.
Design tables with partition keys and primary keys that support your most common query patterns.
Use materialized views to pre-compute expensive aggregations rather than running them on demand.
Test query performance during development to catch slow queries before they reach production.
Consider using max_execution_time_leaf for distributed queries to set per-shard time limits.

Frequently Asked Questions

Q: What is the default value of max_execution_time?
A: The default is 0, meaning no time limit. If you encounter this error, the limit has been explicitly configured in your server settings, user profile, or session.

Q: Does max_execution_time include time spent waiting for locks or resources?
A: Yes, it measures wall-clock time from when the query starts executing. Time spent waiting for locks, network I/O, or disk reads all count toward the limit.

Q: Can I set different timeouts for different types of queries?
A: Not directly by query type, but you can assign different user profiles with different max_execution_time values. Interactive users might have a 30-second limit, while ETL service accounts could have a 10-minute limit.

Q: Will a timed-out query still consume resources after the error?
A: ClickHouse terminates the query and releases its resources promptly after the timeout. However, any mutations or DDL operations triggered before the timeout will continue independently.