NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse: Compare query_log Between Two Time Intervals

After a deploy, a schema change, a new index, or a setting flip, the fastest way to confirm whether anything regressed is to diff system.query_log between two windows. Comparing raw query text is noisy because of literals and parameters, so the trick is to group by normalized_query_hash, which is a stable hash of the query shape with literals stripped. The two recipes below cover the two common shapes of this question: comparing two specific time windows, or comparing the period before and after a known incident.

Comparing two specific time windows

Use this when you have exact intervals such as a deploy at 13:00 and you want to compare the 5-minute windows at 13:00 and 15:00. The query buckets rows into before and after flags and aggregates ProfileEvents for each side.

WITH
    toStartOfInterval(event_time, INTERVAL 5 MINUTE) = '2023-06-30 13:00:00' as before,
    toStartOfInterval(event_time, INTERVAL 5 MINUTE) = '2023-06-30 15:00:00' as after
SELECT
    normalized_query_hash,
    anyIf(query, before) AS QueryBefore,
    anyIf(query, after)  AS QueryAfter,
    countIf(before) AS cnt_before,
    countIf(after)  AS cnt_after,
    sumIf(query_duration_ms, before) AS duration_before,
    sumIf(query_duration_ms, after)  AS duration_after,
    sumIf(ProfileEvents['RealTimeMicroseconds'],   before) / 1000000 AS RealTimeBefore,
    sumIf(ProfileEvents['RealTimeMicroseconds'],   after)  / 1000000 AS RealTimeAfter,
    sumIf(ProfileEvents['UserTimeMicroseconds'],   before) / 1000000 AS UserTimeBefore,
    sumIf(ProfileEvents['UserTimeMicroseconds'],   after)  / 1000000 AS UserTimeAfter,
    sumIf(ProfileEvents['SystemTimeMicroseconds'], before) / 1000000 AS SystemTimeBefore,
    sumIf(ProfileEvents['SystemTimeMicroseconds'], after)  / 1000000 AS SystemTimeAfter,
    sumIf(ProfileEvents['DiskReadElapsedMicroseconds'],     before) / 1000000 AS DiskReadBefore,
    sumIf(ProfileEvents['DiskReadElapsedMicroseconds'],     after)  / 1000000 AS DiskReadAfter,
    sumIf(ProfileEvents['NetworkSendElapsedMicroseconds'],  before) / 1000000 AS NetSendBefore,
    sumIf(ProfileEvents['NetworkSendElapsedMicroseconds'],  after)  / 1000000 AS NetSendAfter,
    sumIf(ProfileEvents['ZooKeeperWaitMicroseconds'],       before) / 1000000 AS ZkWaitBefore,
    sumIf(ProfileEvents['ZooKeeperWaitMicroseconds'],       after)  / 1000000 AS ZkWaitAfter,
    sumIf(read_bytes,  before) AS ReadBytesBefore,
    sumIf(read_bytes,  after)  AS ReadBytesAfter,
    sumIf(read_rows,   before) AS ReadRowsBefore,
    sumIf(read_rows,   after)  AS ReadRowsAfter
FROM system.query_log
WHERE event_date = '2023-06-30'
  AND type IN (2, 4)
GROUP BY normalized_query_hash
ORDER BY duration_after - duration_before DESC
FORMAT Vertical;

type IN (2, 4) keeps only QueryFinish and ExceptionWhileProcessing rows so you do not double-count the QueryStart event. Adjust the two toStartOfInterval values to the windows you want to compare.

Comparing two ranges around an incident timestamp

Use this when you know roughly when a problem started and want to compare a wider window on either side. This version splits rows by a single timestamp_of_issue boundary and surfaces only queries whose count, read bytes, or user time grew meaningfully.

WITH
    toDateTime('2024-02-09 00:00:00') as timestamp_of_issue,
    event_time < timestamp_of_issue as before,
    event_time >= timestamp_of_issue as after
SELECT
    normalized_query_hash as h,
    any(query) as query_sample,
    round(quantileIf(0.9)(query_duration_ms, before)) as duration_q90_before,
    round(quantileIf(0.9)(query_duration_ms, after))  as duration_q90_after,
    countIf(before) as cnt_before,
    countIf(after)  as cnt_after,
    sumIf(query_duration_ms, before) as duration_sum_before,
    sumIf(query_duration_ms, after)  as duration_sum_after,
    sumIf(ProfileEvents['UserTimeMicroseconds'], before) as usertime_sum_before,
    sumIf(ProfileEvents['UserTimeMicroseconds'], after)  as usertime_sum_after,
    sumIf(read_bytes, before) as sum_read_bytes_before,
    sumIf(read_bytes, after)  as sum_read_bytes_after
FROM system.query_log
WHERE event_time BETWEEN timestamp_of_issue - INTERVAL 3 DAY AND timestamp_of_issue + INTERVAL 3 DAY
GROUP BY h
HAVING cnt_after > 1.1 * cnt_before
    OR sum_read_bytes_after > 1.2 * sum_read_bytes_before
    OR usertime_sum_after  > 1.2 * usertime_sum_before
ORDER BY sum_read_bytes_after - sum_read_bytes_before
FORMAT Vertical;

The HAVING clause filters out noise: queries appear only when they grew at least 10% in count, 20% in read bytes, or 20% in user CPU time. Tune the thresholds to your tolerance.

How to read the output

The most useful columns are usually the deltas. Compute duration_after - duration_before, ReadBytesAfter / ReadBytesBefore, or cnt_after - cnt_before. A query that doubled in read bytes but ran the same number of times is likely missing an index or hitting a new partition. A query whose count exploded but duration stayed flat is a client-side change, not a server regression.

UserTimeMicroseconds is the CPU time on the executing thread. A jump there with no jump in RealTimeMicroseconds points to a more expensive code path, not contention. The opposite (real time up, user time flat) points to waits: ZooKeeper, disk, network.

Running cluster-wide

Both queries assume a single node. To diff across an entire cluster:

FROM clusterAllReplicas('{cluster}', system.query_log)

Add hostName() to the GROUP BY if you want to see which node regressed. Drop it to get a cluster-wide aggregate per query shape.

Common Pitfalls

  • Forgetting type IN (2, 4) doubles the row count and skews every aggregate.
  • normalized_query_hash is computed differently across major versions. Comparing logs from a node before and after an upgrade can produce false positives.
  • event_time has 1-second resolution. For sub-second incidents, use event_time_microseconds.
  • If query_log has TTL shorter than the window you want to compare, rows are silently absent. Check SHOW CREATE TABLE system.query_log for the TTL clause.
  • The event_date predicate enables partition pruning. Use it whenever possible, even when your real filter is event_time.

Frequently Asked Questions

Q: What is normalized_query_hash? A: It is a 64-bit hash of the query AST with literals replaced by placeholders. SELECT 1 and SELECT 2 produce the same hash. It is the right key for grouping logically-identical queries that differ only in parameters.

Q: Why use type IN (2, 4) instead of type = 'QueryFinish'? A: type is an Enum8. The numeric form avoids a string compare on every row of query_log. Code 2 is QueryFinish, code 4 is ExceptionWhileProcessing. Both have populated query_duration_ms and ProfileEvents, which is what the aggregates need.

Q: How do I add memory comparisons? A: Add sumIf(memory_usage, before) AS MemBefore, sumIf(memory_usage, after) AS MemAfter to the SELECT. memory_usage is the peak memory of the query in bytes.

Q: Can I compare two different ClickHouse clusters? A: Not directly with one query. Export each cluster's aggregated query_log to a shared table, then join on normalized_query_hash. The hash is deterministic given the same server version.

Q: My query returns zero rows. What is wrong? A: Either your interval has no matching toStartOfInterval boundary, or query_log is disabled, or rows have already expired due to TTL. Confirm with SELECT count() FROM system.query_log WHERE event_date = today().

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.