ClickHouse query_log Useful Queries: Slow Queries, Errors, Top Users

Q: Why group by normalized_query_hash and not by query?

query includes literals, parameters, and timestamps, so the same query shape produces a different string every time it runs. normalized_query_hash strips literals and produces a stable identifier for the query template.

system.query_log is the single richest source of operational data in ClickHouse. It records every executed query along with a full ProfileEvents map: CPU time, disk I/O, network, ZooKeeper waits, memory peak, parts scanned, marks selected, and the normalized query hash. The recipes below are reference queries kept in production runbooks for tracking down expensive workloads, surfacing recurring errors, and feeding index decisions. All examples assume query_log is enabled and target ClickHouse 22.4 or newer.

Most resource-intensive queries

Group by normalized_query_hash so query shapes that differ only in literals collapse into one row. The result is the heaviest workloads on the cluster.

SELECT
    hostName() AS host,
    normalized_query_hash,
    min(event_time),
    max(event_time),
    replace(substr(argMax(query, utime), 1, 80), '\n', ' ') AS query,
    argMax(query_id, utime) AS sample_query_id,
    count(),
    sum(query_duration_ms) / 1000 AS QueriesDuration,
    sum(ProfileEvents['RealTimeMicroseconds'])            / 1000000 AS RealTime,
    sum(ProfileEvents['UserTimeMicroseconds']   as utime) / 1000000 AS UserTime,
    sum(ProfileEvents['SystemTimeMicroseconds']) / 1000000 AS SystemTime,
    sum(ProfileEvents['DiskReadElapsedMicroseconds'])     / 1000000 AS DiskReadTime,
    sum(ProfileEvents['DiskWriteElapsedMicroseconds'])    / 1000000 AS DiskWriteTime,
    sum(ProfileEvents['NetworkSendElapsedMicroseconds'])  / 1000000 AS NetworkSendTime,
    sum(ProfileEvents['NetworkReceiveElapsedMicroseconds']) / 1000000 AS NetworkReceiveTime,
    sum(ProfileEvents['ZooKeeperWaitMicroseconds'])       / 1000000 AS ZooKeeperWaitTime,
    sum(read_rows) AS ReadRows,
    formatReadableSize(sum(read_bytes))     AS ReadBytes,
    sum(written_rows) AS WrittenRows,
    formatReadableSize(sum(written_bytes))  AS WrittenBytes,
    sum(result_rows)  AS ResultRows,
    formatReadableSize(sum(result_bytes))   AS ResultBytes
FROM clusterAllReplicas('{cluster}', system.query_log)
WHERE event_date >= today() AND type IN (2, 4)
GROUP BY normalized_query_hash, host
ORDER BY UserTime DESC
LIMIT 30
FORMAT Vertical;

type IN (2, 4) keeps QueryFinish and ExceptionWhileProcessing rows, both of which have populated ProfileEvents. argMax(query, utime) returns the longest-running instance of each query shape, which is usually the most informative sample.

Long-running or incomplete queries

Queries that crashed, were killed, or simply ran past a threshold:

SELECT
    query_id,
    min(event_time) AS t,
    any(query)
FROM system.query_log
WHERE event_date = today() AND event_time > '2021-11-25 02:29:12'
GROUP BY query_id
HAVING countIf(type = 'QueryFinish') = 0
    OR sum(query_duration_ms) > 100000
ORDER BY t;

A query_id with no QueryFinish row is a query that ended in QueryStart only, meaning the server crashed, the connection dropped, or KILL QUERY ran before completion. The 100,000 ms threshold catches anything that took longer than 100 seconds.

Top columns referenced in WHERE clauses

This is the index-design query. It extracts the predicate of every recent SELECT against a given table, matches it to the table's column list from system.query_log, and ranks columns by frequency. A column that appears in nearly every WHERE clause but is not in the primary key is a candidate for a skip index or a reordered key.

WITH
    any(query) AS q,
    any(tables) AS _tables,
    arrayJoin(extractAll(query, '\\b(?:PRE)?WHERE\\s+(.*?)\\s+(?:GROUP BY|ORDER BY|UNION|SETTINGS|FORMAT$)')) AS w,
    any(columns) AS cols,
    arrayFilter(x -> (position(w, extract(x, '\\.(`[^`]+`|[^\\.]+)$')) > 0), columns) AS c,
    arrayJoin(c) AS c2
SELECT
    c2,
    count()
FROM system.query_log
WHERE (event_time >= (now() - toIntervalDay(1)))
  AND arrayExists(x -> (x LIKE '%target_table%'), tables)
  AND (query ILIKE 'SELECT%')
GROUP BY c2
ORDER BY count() ASC;

Replace target_table with the table whose access patterns you want to study.

Most selected columns

columns is a populated array on query_log rows starting in modern versions. Each entry is database.table.column. Counting unique entries tells you which columns are projected most often. Wide tables benefit from putting hot columns earlier in storage and considering column-level codecs.

SELECT
    col AS column,
    count() AS hits
FROM system.query_log
ARRAY JOIN columns AS col
WHERE type = 'QueryFinish'
  AND query_kind = 'Select'
  AND event_time >= now() - INTERVAL 7 DAY
  AND notEmpty(columns)
GROUP BY col
ORDER BY hits DESC
LIMIT 50;

Most used functions

used_functions is also a populated array. Ranking functions tells you whether your workload is dominated by sum, by string parsing, by JSON extraction, or by something obscure that deserves a materialized view.

SELECT
    f AS function,
    count() AS hits
FROM system.query_log
ARRAY JOIN used_functions AS f
WHERE type = 'QueryFinish'
  AND event_time >= now() - INTERVAL 7 DAY
  AND notEmpty(used_functions)
GROUP BY f
ORDER BY hits DESC
LIMIT 50;

Top users and clients

Quick attribution of who is consuming the cluster:

SELECT
    user,
    client_name,
    count() AS queries,
    sum(query_duration_ms) / 1000 AS total_seconds,
    sum(read_bytes) AS total_read_bytes,
    sum(memory_usage) AS total_mem
FROM system.query_log
WHERE event_date = today() AND type IN (2, 4)
GROUP BY user, client_name
ORDER BY total_seconds DESC;

Common Pitfalls

type = 'QueryStart' rows do not have populated query_duration_ms or ProfileEvents. Filter to type IN (2, 4) for resource analysis.
normalized_query_hash is stable within a major version but can change across upgrades. Do not compare hashes across upgrades.
The columns and used_functions arrays are populated only for recent ClickHouse versions and require log_queries = 1. They are not present in old query_log rows.
Running these queries against a busy query_log is itself a heavy read. Add event_date predicates to enable partition pruning.
clusterAllReplicas('{cluster}', system.query_log) works only if the cluster is defined and remote query_log is reachable. On standalone servers, drop the cluster wrapper.
memory_usage is the peak memory for the query on the executing node. For distributed queries, sum across hosts for the true peak.

Frequently Asked Questions

Q: Why group by normalized_query_hash and not by query? A: query includes literals, parameters, and timestamps, so the same query shape produces a different string every time it runs. normalized_query_hash strips literals and produces a stable identifier for the query template.

Q: My ProfileEvents lookups return zero. A: Either the ProfileEvent name is misspelled, or the version does not produce that event. Check available keys with SELECT arrayJoin(mapKeys(ProfileEvents)) FROM system.query_log WHERE type = 'QueryFinish' LIMIT 1.

Q: How long does query_log keep data? A: It depends on the TTL set in the server config. The default in recent versions is around 30 days. Check with SHOW CREATE TABLE system.query_log.

Q: Should I always use clusterAllReplicas? A: Use it when you want a cluster-wide picture. For per-node debugging, query the local system.query_log directly to avoid network overhead and to keep stack traces simple.

Q: How do I find the user who ran a specific query_id? A: SELECT user, client_name, address FROM system.query_log WHERE query_id = '...' LIMIT 1. The row includes the originating address and the client.