A ClickHouse server that stops responding usually has one of three shapes: a query is in a tight loop on CPU, a query is blocked on a lock or disk, or the whole server is wedged and no SQL gets through. The first two are diagnosable in-band by reading system.stack_trace. The third needs help from the OS, typically by sending signals that force every thread to write its stack to the log. This article walks through both approaches.
In-band diagnosis with system.stack_trace
system.stack_trace collects a synthetic stack trace from every running thread at the moment you query it. Combined with addressToSymbol and demangle, you get a profile of where the server is spending time right now.
SELECT
arrayStringConcat(arrayMap(x -> concat('0x', lower(hex(x)), '\t', demangle(addressToSymbol(x))), trace), '\n') AS trace_functions,
count()
FROM system.stack_trace
GROUP BY trace_functions
ORDER BY count() DESC
SETTINGS allow_introspection_functions = 1
FORMAT Vertical;
The result is one row per unique stack, with a count of threads parked there. The stack at the top is where the most threads currently sit. If it points into a merge function, a heavy aggregation, or a lock primitive, you have your answer.
What the columns tell you
| Symbol pattern | Likely cause |
|---|---|
pthread_cond_wait, futex_wait |
Threads idle, waiting for work or for a lock |
MergeTreeRangeReader::read, MergeTreeReaderCompact |
Large scans, missing indexes |
Aggregator::merge, HashJoin::insertFromBlock |
Heavy GROUP BY or JOIN |
ZooKeeper::receiveEvent |
Stuck waiting on Keeper / ZooKeeper |
MergeTask::execute, MergeTreeDataMergerMutator::mergePartsToTemporaryPart |
Background merges holding resources |
mmap, madvise |
Page cache or memory map operations under pressure |
Narrowing to one query
To isolate a single hanging query, join system.stack_trace with system.processes:
SELECT
p.query_id,
p.elapsed,
arrayStringConcat(arrayMap(x -> concat('0x', lower(hex(x)), '\t', demangle(addressToSymbol(x))), s.trace), '\n') AS trace_functions
FROM system.stack_trace s
JOIN system.processes p ON s.thread_id = p.thread_ids[1]
SETTINGS allow_introspection_functions = 1
FORMAT Vertical;
system.processes only shows queries that are currently running. If a query is gone from processes but the server is still slow, the cost is elsewhere: background merges, mutations, or replication.
When SQL no longer responds
If clickhouse-client cannot connect or every query times out, you cannot query system.stack_trace. The next step is OS-level. The commands below dump per-thread stacks to the server log so you can read them after the fact.
Newer versions (SIGTSTP)
for i in $(ls -1 /proc/$(pidof clickhouse-server)/task/); do
kill -TSTP $i
done
On modern ClickHouse builds, SIGTSTP is intercepted by the server and triggers a stack dump for each thread to the log. The server is not actually stopped; the signal is repurposed.
Older versions (SIGPROF)
for i in $(ls -1 /proc/$(pidof clickhouse-server)/task/); do
kill -SIGPROF $i
done
On older builds, SIGPROF is the signal that asks the profiler to record a sample. Sending it to every thread produces a profile snapshot in the log.
Both commands need root or the same user that runs clickhouse-server. After running, tail the server log:
tail -F /var/log/clickhouse-server/clickhouse-server.log
You will see thread IDs followed by symbolized stack traces. Demangle them with c++filt if they are not already readable.
When to use which approach
| Symptom | First step |
|---|---|
| One query is slow, others run fine | system.stack_trace + system.processes join |
| All queries are slow, server responds | system.stack_trace grouped by stack |
KILL QUERY does nothing |
system.stack_trace to find the lock holder |
clickhouse-client cannot connect |
kill -TSTP loop, then read the log |
| Server crashed already | Look at clickhouse-server.err.log for the saved trace |
Common Pitfalls
allow_introspection_functions = 1must be enabled in the session, profile, or settings. Without it,addressToSymbolanddemanglereturn empty strings.- Without debug symbols installed,
demangle(addressToSymbol(x))returns raw mangled names. Install theclickhouse-common-static-dbgpackage or the debug symbols matching your build. - Querying
system.stack_traceis itself expensive. Every thread is briefly paused to capture its trace. Do not loop this query at high frequency. - The signal-based dump works only on Linux. The
/proc/$pid/task/enumeration does not exist on other operating systems. KILL QUERYonly signals the query to cooperate. If the thread is in a non-interruptible syscall, the query keeps running until that syscall returns. The stack trace tells you which one.- If you see threads stuck in
ZooKeeper::receiveEventfor long periods, the problem is most likely Keeper, not ClickHouse.
Frequently Asked Questions
Q: Will SIGTSTP suspend the server?
A: No, in modern ClickHouse the signal is reassigned to a stack dump handler. The server keeps running. Use a recent build to be safe; on very old versions, SIGTSTP does suspend processes by default, so prefer SIGPROF there.
Q: How is system.stack_trace different from a CPU profiler?
A: system.stack_trace is a single snapshot. A CPU profiler like clickhouse-server built-in sampling collects many snapshots over time. For a hang, a single snapshot is usually enough because all threads are stuck in roughly the same place.
Q: I cannot read the symbols, I see only hex addresses.
A: Install the debug symbols package matching your ClickHouse version, then restart your client session. addressToSymbol reads the running binary's debug info at query time.
Q: KILL QUERY ASYNC SYNC TIMEOUT all return nothing.
A: The query is in an uninterruptible state. Get its stack trace and identify the syscall or lock. Common culprits: a stuck NFS mount, a Keeper connection waiting, or a deadlock on mutations metadata.
Q: Can I run this on a managed ClickHouse service?
A: system.stack_trace works on managed services that expose it. The signal-based dumps require shell access to the node, which managed services usually do not provide. Open a support ticket instead.