ClickHouse EXPLAIN: Query Plan Analysis and Optimization

Q: Does ClickHouse have an equivalent to PostgreSQL's `EXPLAIN ANALYZE`?

Not directly. ClickHouse EXPLAIN only plans the query; it doesn't execute it. To get actual rows read, bytes read, and timing, run the query and inspect system.query_log (or the client's progress/profiling output). EXPLAIN ESTIMATE gives a pre-execution estimate of rows and marks.

Q: Which EXPLAIN type tells me if my primary key is being used?

EXPLAIN indexes = 1 . Look at the ReadFromMergeTree step's Granules: selected/total ratio. A small selected count means the primary key is pruning effectively; a count equal to the total means a full scan.

Q: Can I get EXPLAIN output as JSON?

Yes. EXPLAIN json = 1 SELECT ... emits the plan as structured JSON, which is convenient for tooling. You can also append FORMAT JSON or other output formats.

Q: Does `EXPLAIN` work on `INSERT` or DDL statements?

EXPLAIN AST supports all statement types. The plan-oriented types ( PLAN , PIPELINE , ESTIMATE ) are aimed at SELECT and read-path analysis.

EXPLAIN is ClickHouse's primary tool for understanding how a query will run before it runs — which indexes apply, how many granules and parts get read, where aggregation and sorting happen, and how work is parallelized across threads. Reading an EXPLAIN plan is the fastest way to diagnose why a query is slow without resorting to guesswork.

Unlike PostgreSQL's EXPLAIN ANALYZE, ClickHouse's EXPLAIN does not execute the query — it shows the planned execution. To see actual timings and row counts you run the query and inspect system.query_log or use the client's profiling output. This guide covers each EXPLAIN type and how to read the parts that matter most for optimization.

EXPLAIN Syntax

The general form is:

EXPLAIN [TYPE] [setting = value, ...] SELECT ...

If you omit the type, EXPLAIN PLAN is the default. Settings are passed inline as key = value pairs and control how much detail is emitted.

The EXPLAIN Types

ClickHouse supports several EXPLAIN types, each exposing a different stage of query processing:

Type	What it shows	Most useful for
`AST`	Abstract syntax tree (raw parse)	Debugging parser/AST-size errors
`SYNTAX`	Query after syntactic rewrites and optimization	Seeing how ClickHouse rewrote your SQL
`QUERY TREE`	Query tree after analyzer passes	Understanding analyzer-based optimizations
`PLAN`	Logical query plan steps (default)	Index/granule usage, plan structure
`PIPELINE`	Physical execution pipeline of processors	Parallelism, thread counts, processor chains
`ESTIMATE`	Estimated parts, rows, and marks to read	Quick cost pre-check on MergeTree tables
`TABLE OVERRIDE`	Validated schema overrides for table functions	Debugging table-function schemas

For day-to-day query tuning, EXPLAIN PLAN indexes = 1 and EXPLAIN PIPELINE do most of the work. The other types are for narrower debugging tasks.

EXPLAIN PLAN: The Workhorse

EXPLAIN PLAN dumps the logical query plan as a tree of steps. Read it from bottom to top: data enters at the leaves (reads from storage) and flows up through filters, aggregations, sorts, and projections to the output.

EXPLAIN
SELECT count()
FROM trips
WHERE pickup_datetime BETWEEN '2009-01-01' AND '2009-04-01';

A typical plan looks like:

Expression ((Projection + Before ORDER BY))
  Aggregating
    Expression (Before GROUP BY)
      Filter (WHERE)
        ReadFromMergeTree (nyc_taxi.trips)

Each step corresponds to a stage of execution. The leaf ReadFromMergeTree is where the actual data scan happens — and where most optimization wins or losses are decided.

Useful PLAN settings

-- Show index and granule analysis (the most important one)
EXPLAIN indexes = 1 SELECT ... ;

-- Show the column headers produced by each step
EXPLAIN header = 1 SELECT ... ;

-- Show detailed actions (functions, expressions) per step
EXPLAIN actions = 1 SELECT ... ;

-- Show whether projections were considered
EXPLAIN projections = 1 SELECT ... ;

-- Emit the plan as JSON for programmatic parsing
EXPLAIN json = 1 SELECT ... ;

By default description = 1 (step descriptions on) and optimize = 1 (plan optimizations applied). Setting optimize = 0 shows the plan before optimizations, which is occasionally useful when you suspect an optimization is misbehaving.

Reading Index and Granule Usage

The single most valuable EXPLAIN invocation for performance work is EXPLAIN indexes = 1. It expands the ReadFromMergeTree step to show exactly which indexes were applied and how much data survived pruning.

EXPLAIN indexes = 1
SELECT count()
FROM trips
WHERE pickup_datetime BETWEEN '2009-01-01' AND '2009-04-01';

ReadFromMergeTree (nyc_taxi.trips)
Indexes:
  PrimaryKey
    Keys: pickup_datetime
    Condition: and((pickup_datetime in (-Inf, 1238543999]),
                   (pickup_datetime in [1230768000, +Inf]))
    Parts: 9/9
    Granules: 5061/40167

How to read this:

Keys — the primary-key columns ClickHouse could use against your WHERE.
Condition — the predicate translated into key ranges.
Parts: 9/9 — parts selected versus total. Here all parts contain matching rows.
Granules: 5061/40167 — granules to read versus total. Only 5,061 of 40,167 survive pruning.

A granule is the smallest unit ClickHouse reads, holding index_granularity rows (8,192 by default). The ratio selected / total granules is your headline metric: the lower it is, the more effective your primary key and skip indexes are for this query.

Warning sign: if Granules shows the full count (e.g. 40167/40167), the query is doing a full scan and the primary key is not helping. That usually means the WHERE columns don't lead the ORDER BY key, or a function is wrapping the key column and blocking range analysis.

If you've defined data-skipping indexes or projections, they appear as additional entries under Indexes: (e.g. Skip with the index name, or a ReadFromMergeTree (Projection: ...) step). See the ClickHouse projections guide for verifying projection usage with EXPLAIN.

EXPLAIN ESTIMATE: Quick Cost Check

EXPLAIN ESTIMATE returns a compact table of how much MergeTree data a query is expected to touch, without the full plan tree:

EXPLAIN ESTIMATE
SELECT count()
FROM trips
WHERE pickup_datetime BETWEEN '2009-01-01' AND '2009-04-01';

┌─database─┬─table─┬─parts─┬───rows─┬─marks─┐
│ nyc_taxi │ trips │     9 │ 41459… │  5061 │
└──────────┴───────┴───────┴────────┴───────┘

The columns are database, table, parts, rows, and marks (marks correspond to selected granules). It's a fast way to compare two query variants — lower rows and marks mean less work. It only covers MergeTree-family tables and reports estimated, not exact, figures.

EXPLAIN PIPELINE: Parallelism and Processors

EXPLAIN PIPELINE shows the physical pipeline of processors that actually move and transform data, including how many parallel streams ClickHouse will spin up.

EXPLAIN PIPELINE
SELECT toStartOfHour(pickup_datetime) AS h, count()
FROM trips
GROUP BY h;

This reveals where work is parallelized (the number of concurrent streams, often tied to max_threads) and where streams are resized or merged. Use it when a query isn't using the cores you expect, or when you want to see whether an aggregation is running single-threaded. Add graph = 1 (optionally with compact = 0) to emit a DOT graph you can render with Graphviz.

EXPLAIN SYNTAX and AST

EXPLAIN SYNTAX shows your query after ClickHouse's syntactic rewrites — useful for confirming how predicates were pushed down, how IN subqueries were transformed, or why an optimization didn't fire:

EXPLAIN SYNTAX
SELECT * FROM trips WHERE pickup_datetime > now() - INTERVAL 1 DAY;

EXPLAIN AST dumps the raw parse tree and works on any statement type, not just SELECT. It's mainly a debugging aid — for example when a generated query trips the AST-size limit. See ClickHouse "Too big AST" errors for that scenario.

EXPLAIN QUERY TREE (the analyzer's representation) is a more modern counterpart to SYNTAX and is helpful when investigating analyzer-specific behavior on recent versions. Both run_passes and dump_passes settings let you inspect individual analyzer passes.

A Practical Optimization Workflow

Reproduce the slow query and run EXPLAIN indexes = 1 on it. Note the Granules: selected/total ratio.
If granules ≈ total, the primary key isn't pruning. Check that your filter columns lead the table's ORDER BY, and that no function wraps a key column in the WHERE.
Adjust — reorder the primary key, add a data-skipping index, or add a projection for an alternative sort order.
Re-run EXPLAIN indexes = 1 and confirm the granule count dropped. A before/after granule comparison is the most reliable signal that an optimization worked.
For parallelism issues, use EXPLAIN PIPELINE to confirm the query uses multiple streams.
For join-heavy queries, EXPLAIN shows the join step but not the chosen algorithm's memory profile — pair it with the ClickHouse join performance guide.

Common Issues

EXPLAIN doesn't show timings. That's expected — it plans but doesn't execute. For real timings, run the query and read system.query_log, or check read_rows / read_bytes after execution.
Plan looks fine but the query is still slow. EXPLAIN reflects the plan, not runtime contention (disk I/O, memory pressure, concurrent load). If a query hangs or stalls despite a clean plan, see debugging a hanging ClickHouse server or query.
indexes = 1 shows full granule scan unexpectedly. A function on the key column (e.g. toDate(ts) = ... instead of a range on ts) prevents primary-key range analysis. Rewrite the predicate to compare the raw key column.
Projection not picked up. Confirm with EXPLAIN projections = 1; the query's GROUP BY/ORDER BY must match the projection definition and the projection must be materialized for existing parts.

How Pulse Helps

Reading EXPLAIN plans is essential, but doing it query-by-query across a busy cluster is slow. Pulse continuously monitors ClickHouse query patterns, surfaces the queries scanning the most granules and parts, and flags tables whose primary keys or projections don't match real workloads — so you know which queries are worth running EXPLAIN on in the first place. When an optimization regresses after a schema change or version upgrade, Pulse highlights the shift so you can act before it reaches production dashboards. It's run by engineers who operate ClickHouse, Elasticsearch, and other data systems in production every day.

Frequently Asked Questions

Q: Does ClickHouse have an equivalent to PostgreSQL's EXPLAIN ANALYZE?

Not directly. ClickHouse EXPLAIN only plans the query; it doesn't execute it. To get actual rows read, bytes read, and timing, run the query and inspect system.query_log (or the client's progress/profiling output). EXPLAIN ESTIMATE gives a pre-execution estimate of rows and marks.

Q: Which EXPLAIN type tells me if my primary key is being used?

EXPLAIN indexes = 1. Look at the ReadFromMergeTree step's Granules: selected/total ratio. A small selected count means the primary key is pruning effectively; a count equal to the total means a full scan.

Q: What's the difference between granules, marks, and parts in the output?

Parts are physical on-disk data segments. A granule is a block of rows (index_granularity, default 8,192) — the smallest unit ClickHouse reads. Marks index granules, so the mark count in EXPLAIN ESTIMATE equals the selected-granule count in EXPLAIN indexes = 1.

Q: Can I get EXPLAIN output as JSON?

Yes. EXPLAIN json = 1 SELECT ... emits the plan as structured JSON, which is convenient for tooling. You can also append FORMAT JSON or other output formats.

Q: Why does my plan show a full granule scan even though I filter on an indexed column?

The most common cause is wrapping the key column in a function (for example toDate(event_time) = today()), which prevents range analysis. Rewrite the filter as a range on the raw column (event_time >= ... AND event_time < ...) so the primary key can prune granules.

Q: Does EXPLAIN work on INSERT or DDL statements?

EXPLAIN AST supports all statement types. The plan-oriented types (PLAN, PIPELINE, ESTIMATE) are aimed at SELECT and read-path analysis.