After a deploy, a schema change, a new index, or a setting flip, the fastest way to confirm whether anything regressed is to diff system.query_log between two windows. Comparing raw query text is noisy because of literals and parameters, so the trick is to group by normalized_query_hash, which is a stable hash of the query shape with literals stripped. The two recipes below cover the two common shapes of this question: comparing two specific time windows, or comparing the period before and after a known incident.
Comparing two specific time windows
Use this when you have exact intervals such as a deploy at 13:00 and you want to compare the 5-minute windows at 13:00 and 15:00. The query buckets rows into before and after flags and aggregates ProfileEvents for each side.
WITH
toStartOfInterval(event_time, INTERVAL 5 MINUTE) = '2023-06-30 13:00:00' as before,
toStartOfInterval(event_time, INTERVAL 5 MINUTE) = '2023-06-30 15:00:00' as after
SELECT
normalized_query_hash,
anyIf(query, before) AS QueryBefore,
anyIf(query, after) AS QueryAfter,
countIf(before) AS cnt_before,
countIf(after) AS cnt_after,
sumIf(query_duration_ms, before) AS duration_before,
sumIf(query_duration_ms, after) AS duration_after,
sumIf(ProfileEvents['RealTimeMicroseconds'], before) / 1000000 AS RealTimeBefore,
sumIf(ProfileEvents['RealTimeMicroseconds'], after) / 1000000 AS RealTimeAfter,
sumIf(ProfileEvents['UserTimeMicroseconds'], before) / 1000000 AS UserTimeBefore,
sumIf(ProfileEvents['UserTimeMicroseconds'], after) / 1000000 AS UserTimeAfter,
sumIf(ProfileEvents['SystemTimeMicroseconds'], before) / 1000000 AS SystemTimeBefore,
sumIf(ProfileEvents['SystemTimeMicroseconds'], after) / 1000000 AS SystemTimeAfter,
sumIf(ProfileEvents['DiskReadElapsedMicroseconds'], before) / 1000000 AS DiskReadBefore,
sumIf(ProfileEvents['DiskReadElapsedMicroseconds'], after) / 1000000 AS DiskReadAfter,
sumIf(ProfileEvents['NetworkSendElapsedMicroseconds'], before) / 1000000 AS NetSendBefore,
sumIf(ProfileEvents['NetworkSendElapsedMicroseconds'], after) / 1000000 AS NetSendAfter,
sumIf(ProfileEvents['ZooKeeperWaitMicroseconds'], before) / 1000000 AS ZkWaitBefore,
sumIf(ProfileEvents['ZooKeeperWaitMicroseconds'], after) / 1000000 AS ZkWaitAfter,
sumIf(read_bytes, before) AS ReadBytesBefore,
sumIf(read_bytes, after) AS ReadBytesAfter,
sumIf(read_rows, before) AS ReadRowsBefore,
sumIf(read_rows, after) AS ReadRowsAfter
FROM system.query_log
WHERE event_date = '2023-06-30'
AND type IN (2, 4)
GROUP BY normalized_query_hash
ORDER BY duration_after - duration_before DESC
FORMAT Vertical;
type IN (2, 4) keeps only QueryFinish and ExceptionWhileProcessing rows so you do not double-count the QueryStart event. Adjust the two toStartOfInterval values to the windows you want to compare.
Comparing two ranges around an incident timestamp
Use this when you know roughly when a problem started and want to compare a wider window on either side. This version splits rows by a single timestamp_of_issue boundary and surfaces only queries whose count, read bytes, or user time grew meaningfully.
WITH
toDateTime('2024-02-09 00:00:00') as timestamp_of_issue,
event_time < timestamp_of_issue as before,
event_time >= timestamp_of_issue as after
SELECT
normalized_query_hash as h,
any(query) as query_sample,
round(quantileIf(0.9)(query_duration_ms, before)) as duration_q90_before,
round(quantileIf(0.9)(query_duration_ms, after)) as duration_q90_after,
countIf(before) as cnt_before,
countIf(after) as cnt_after,
sumIf(query_duration_ms, before) as duration_sum_before,
sumIf(query_duration_ms, after) as duration_sum_after,
sumIf(ProfileEvents['UserTimeMicroseconds'], before) as usertime_sum_before,
sumIf(ProfileEvents['UserTimeMicroseconds'], after) as usertime_sum_after,
sumIf(read_bytes, before) as sum_read_bytes_before,
sumIf(read_bytes, after) as sum_read_bytes_after
FROM system.query_log
WHERE event_time BETWEEN timestamp_of_issue - INTERVAL 3 DAY AND timestamp_of_issue + INTERVAL 3 DAY
GROUP BY h
HAVING cnt_after > 1.1 * cnt_before
OR sum_read_bytes_after > 1.2 * sum_read_bytes_before
OR usertime_sum_after > 1.2 * usertime_sum_before
ORDER BY sum_read_bytes_after - sum_read_bytes_before
FORMAT Vertical;
The HAVING clause filters out noise: queries appear only when they grew at least 10% in count, 20% in read bytes, or 20% in user CPU time. Tune the thresholds to your tolerance.
How to read the output
The most useful columns are usually the deltas. Compute duration_after - duration_before, ReadBytesAfter / ReadBytesBefore, or cnt_after - cnt_before. A query that doubled in read bytes but ran the same number of times is likely missing an index or hitting a new partition. A query whose count exploded but duration stayed flat is a client-side change, not a server regression.
UserTimeMicroseconds is the CPU time on the executing thread. A jump there with no jump in RealTimeMicroseconds points to a more expensive code path, not contention. The opposite (real time up, user time flat) points to waits: ZooKeeper, disk, network.
Running cluster-wide
Both queries assume a single node. To diff across an entire cluster:
FROM clusterAllReplicas('{cluster}', system.query_log)
Add hostName() to the GROUP BY if you want to see which node regressed. Drop it to get a cluster-wide aggregate per query shape.
Common Pitfalls
- Forgetting
type IN (2, 4)doubles the row count and skews every aggregate. normalized_query_hashis computed differently across major versions. Comparing logs from a node before and after an upgrade can produce false positives.event_timehas 1-second resolution. For sub-second incidents, useevent_time_microseconds.- If
query_loghas TTL shorter than the window you want to compare, rows are silently absent. CheckSHOW CREATE TABLE system.query_logfor the TTL clause. - The
event_datepredicate enables partition pruning. Use it whenever possible, even when your real filter isevent_time.
Frequently Asked Questions
Q: What is normalized_query_hash?
A: It is a 64-bit hash of the query AST with literals replaced by placeholders. SELECT 1 and SELECT 2 produce the same hash. It is the right key for grouping logically-identical queries that differ only in parameters.
Q: Why use type IN (2, 4) instead of type = 'QueryFinish'?
A: type is an Enum8. The numeric form avoids a string compare on every row of query_log. Code 2 is QueryFinish, code 4 is ExceptionWhileProcessing. Both have populated query_duration_ms and ProfileEvents, which is what the aggregates need.
Q: How do I add memory comparisons?
A: Add sumIf(memory_usage, before) AS MemBefore, sumIf(memory_usage, after) AS MemAfter to the SELECT. memory_usage is the peak memory of the query in bytes.
Q: Can I compare two different ClickHouse clusters?
A: Not directly with one query. Export each cluster's aggregated query_log to a shared table, then join on normalized_query_hash. The hash is deterministic given the same server version.
Q: My query returns zero rows. What is wrong?
A: Either your interval has no matching toStartOfInterval boundary, or query_log is disabled, or rows have already expired due to TTL. Confirm with SELECT count() FROM system.query_log WHERE event_date = today().