ClickHouse Debugging Hung Queries and Stuck Server: Query Recipes

Q: How is system.stack_trace different from a CPU profiler?

system.stack_trace is a single snapshot. A CPU profiler like clickhouse-server built-in sampling collects many snapshots over time. For a hang, a single snapshot is usually enough because all threads are stuck in roughly the same place.

Q: Can I run this on a managed ClickHouse service?

system.stack_trace works on managed services that expose it. The signal-based dumps require shell access to the node, which managed services usually do not provide. Open a support ticket instead.

A ClickHouse server that stops responding usually has one of three shapes: a query is in a tight loop on CPU, a query is blocked on a lock or disk, or the whole server is wedged and no SQL gets through. The first two are diagnosable in-band by reading system.stack_trace. The third needs help from the OS, typically by sending signals that force every thread to write its stack to the log. This article walks through both approaches.

In-band diagnosis with system.stack_trace

system.stack_trace collects a synthetic stack trace from every running thread at the moment you query it. Combined with addressToSymbol and demangle, you get a profile of where the server is spending time right now.

SELECT
    arrayStringConcat(arrayMap(x -> concat('0x', lower(hex(x)), '\t', demangle(addressToSymbol(x))), trace), '\n') AS trace_functions,
    count()
FROM system.stack_trace
GROUP BY trace_functions
ORDER BY count() DESC
SETTINGS allow_introspection_functions = 1
FORMAT Vertical;

The result is one row per unique stack, with a count of threads parked there. The stack at the top is where the most threads currently sit. If it points into a merge function, a heavy aggregation, or a lock primitive, you have your answer.

What the columns tell you

Symbol pattern	Likely cause
`pthread_cond_wait`, `futex_wait`	Threads idle, waiting for work or for a lock
`MergeTreeRangeReader::read`, `MergeTreeReaderCompact`	Large scans, missing indexes
`Aggregator::merge`, `HashJoin::insertFromBlock`	Heavy GROUP BY or JOIN
`ZooKeeper::receiveEvent`	Stuck waiting on Keeper / ZooKeeper
`MergeTask::execute`, `MergeTreeDataMergerMutator::mergePartsToTemporaryPart`	Background merges holding resources
`mmap`, `madvise`	Page cache or memory map operations under pressure

Narrowing to one query

To isolate a single hanging query, join system.stack_trace with system.processes:

SELECT
    p.query_id,
    p.elapsed,
    arrayStringConcat(arrayMap(x -> concat('0x', lower(hex(x)), '\t', demangle(addressToSymbol(x))), s.trace), '\n') AS trace_functions
FROM system.stack_trace s
JOIN system.processes p ON s.thread_id = p.thread_ids[1]
SETTINGS allow_introspection_functions = 1
FORMAT Vertical;

system.processes only shows queries that are currently running. If a query is gone from processes but the server is still slow, the cost is elsewhere: background merges, mutations, or replication.

When SQL no longer responds

If clickhouse-client cannot connect or every query times out, you cannot query system.stack_trace. The next step is OS-level. The commands below dump per-thread stacks to the server log so you can read them after the fact.

Newer versions (SIGTSTP)

for i in $(ls -1 /proc/$(pidof clickhouse-server)/task/); do
    kill -TSTP $i
done

On modern ClickHouse builds, SIGTSTP is intercepted by the server and triggers a stack dump for each thread to the log. The server is not actually stopped; the signal is repurposed.

Older versions (SIGPROF)

for i in $(ls -1 /proc/$(pidof clickhouse-server)/task/); do
    kill -SIGPROF $i
done

On older builds, SIGPROF is the signal that asks the profiler to record a sample. Sending it to every thread produces a profile snapshot in the log.

Both commands need root or the same user that runs clickhouse-server. After running, tail the server log:

tail -F /var/log/clickhouse-server/clickhouse-server.log

You will see thread IDs followed by symbolized stack traces. Demangle them with c++filt if they are not already readable.

When to use which approach

Symptom	First step
One query is slow, others run fine	`system.stack_trace` + `system.processes` join
All queries are slow, server responds	`system.stack_trace` grouped by stack
`KILL QUERY` does nothing	`system.stack_trace` to find the lock holder
`clickhouse-client` cannot connect	`kill -TSTP` loop, then read the log
Server crashed already	Look at `clickhouse-server.err.log` for the saved trace

Common Pitfalls

allow_introspection_functions = 1 must be enabled in the session, profile, or settings. Without it, addressToSymbol and demangle return empty strings.
Without debug symbols installed, demangle(addressToSymbol(x)) returns raw mangled names. Install the clickhouse-common-static-dbg package or the debug symbols matching your build.
Querying system.stack_trace is itself expensive. Every thread is briefly paused to capture its trace. Do not loop this query at high frequency.
The signal-based dump works only on Linux. The /proc/$pid/task/ enumeration does not exist on other operating systems.
KILL QUERY only signals the query to cooperate. If the thread is in a non-interruptible syscall, the query keeps running until that syscall returns. The stack trace tells you which one.
If you see threads stuck in ZooKeeper::receiveEvent for long periods, the problem is most likely Keeper, not ClickHouse.

Frequently Asked Questions

Q: Will SIGTSTP suspend the server? A: No, in modern ClickHouse the signal is reassigned to a stack dump handler. The server keeps running. Use a recent build to be safe; on very old versions, SIGTSTP does suspend processes by default, so prefer SIGPROF there.

Q: How is system.stack_trace different from a CPU profiler? A: system.stack_trace is a single snapshot. A CPU profiler like clickhouse-server built-in sampling collects many snapshots over time. For a hang, a single snapshot is usually enough because all threads are stuck in roughly the same place.

Q: I cannot read the symbols, I see only hex addresses. A: Install the debug symbols package matching your ClickHouse version, then restart your client session. addressToSymbol reads the running binary's debug info at query time.

Q: KILL QUERY ASYNC SYNC TIMEOUT all return nothing. A: The query is in an uninterruptible state. Get its stack trace and identify the syscall or lock. Common culprits: a stuck NFS mount, a Keeper connection waiting, or a deadlock on mutations metadata.

Q: Can I run this on a managed ClickHouse service? A: system.stack_trace works on managed services that expose it. The signal-based dumps require shell access to the node, which managed services usually do not provide. Open a support ticket instead.