ClickHouse Aggregate Function Combinators: -State, -Merge, -If, -Map, and More

Combinators are suffixes you append to any ClickHouse aggregate function to change its behavior — sum becomes sumIf, sumState, sumMap, or sumResample without ClickHouse defining a separate function for each variant. They compose, so a single function can carry several combinators at once. This guide walks through the full family, with a focus on -State/-Merge, which underpin incremental aggregation in AggregatingMergeTree and materialized views.

A combinator works with essentially any aggregate function (sum, count, avg, uniq, quantile, groupArray, …). Instead of memorizing hundreds of function names, you learn the base functions and the combinator suffixes, then combine them.

The Combinator Family at a Glance

Combinator What it does Example
-If Aggregates only rows matching a condition sumIf(amount, status = 'paid')
-Array Applies the function across all elements of array arguments sumArray(values)
-Map Aggregates Map values per key sumMap(counters)
-State Returns the intermediate aggregation state, not the final value uniqState(user_id)
-Merge Combines intermediate states into the final value uniqMerge(state)
-MergeState Merges states but returns a new state, not the final value uniqMergeState(state)
-SimpleState Returns a SimpleAggregateFunction state (lighter than -State) sumSimpleState(amount)
-ForEach Aggregates corresponding elements across rows of arrays sumForEach(arr)
-Resample Buckets rows by a key into intervals and aggregates each bucket groupArrayResample(0, 100, 10)(name, age)
-Distinct Aggregates only distinct argument combinations sumDistinct(x)
-OrNull Returns NULL instead of a default when there is no input sumOrNull(x)
-OrDefault Returns the type's default when there is no input avgOrDefault(x)

These compose. uniqArrayIf(arr, cond), sumStateIf(x, cond), and quantilesTimingMerge(...) are all valid.

-If: Conditional Aggregation

-If adds a final UInt8 condition argument and aggregates only the rows where the condition is true. It is the idiomatic way to compute several conditional metrics in a single pass over the table:

SELECT
    countIf(status = 'error')                AS errors,
    countIf(status = 'ok')                   AS ok,
    sumIf(bytes, method = 'POST')            AS post_bytes,
    avgIf(latency_ms, latency_ms > 0)        AS avg_nonzero_latency
FROM requests;

If the condition never matches, the function returns its default (zero for sum/count, an empty value for groupArray, and so on). This is far cheaper than running the same aggregation in multiple subqueries or with multiple GROUP BY passes.

-Array, -ForEach, and -Map: Aggregating Nested Data

-Array treats array arguments as a flattened stream of elements. sumArray(arr) sums every element of every array in the group:

SELECT sumArray(scores) FROM games;   -- total of all scores across all arrays

-ForEach instead aggregates positionally — element 0 with element 0, element 1 with element 1, and so on — across rows. Given arrays [1, 2], [3, 4, 5], [6, 7], sumForEach returns [10, 13, 5]. It is the right tool for per-slot aggregation, such as summing hourly buckets stored as arrays.

-Map aggregates a Map column by key. Keys present in any row are summed independently:

SELECT sumMap(tag_counts) FROM events;
-- input rows: {'a':1,'b':2}, {'b':3,'c':4}
-- result:     {'a':1,'b':5,'c':4}

minMap, maxMap, and avgMap work the same way per key.

-State and -Merge: Incremental Aggregation

This is the most important pair. A -State function does not return a number — it returns an opaque intermediate aggregation state (an AggregateFunction value). A -Merge function takes those states and produces the final result. This split is what makes incremental, pre-aggregated rollups possible.

The classic pattern: a source table, an AggregatingMergeTree target storing states, and a materialized view that writes states on insert.

-- Target table stores states, not final values
CREATE TABLE agg_visits
(
    StartDate DateTime64 NOT NULL,
    CounterID UInt64,
    Visits AggregateFunction(sum, Nullable(Int32)),
    Users  AggregateFunction(uniq, Nullable(Int32))
)
ENGINE = AggregatingMergeTree()
ORDER BY (StartDate, CounterID);

-- Materialized view emits states on each insert into the source table
CREATE MATERIALIZED VIEW visits_mv TO agg_visits AS
SELECT
    StartDate,
    CounterID,
    sumState(Sign)    AS Visits,
    uniqState(UserID) AS Users
FROM visits
GROUP BY StartDate, CounterID;

When you read the table you must finalize the states with -Merge, and aggregate again so that states from different parts are combined:

SELECT
    StartDate,
    sumMerge(Visits) AS Visits,
    uniqMerge(Users) AS Users
FROM agg_visits
GROUP BY StartDate
ORDER BY StartDate;

The key rule: a column of type AggregateFunction(...) is meaningless until you apply the matching -Merge. Selecting Visits directly returns the binary state, not a number. The base function in -State and -Merge must match (sumState pairs with sumMerge, never avgMerge), and the argument types must match too — mixing incompatible states raises Cannot add different aggregate states. For more on ingesting states directly into such tables, see ingesting AggregateFunction states.

-MergeState

-MergeState merges states like -Merge but returns a new state instead of the final value. It is used when you want to roll states up through another layer (for example, a second materialized view that aggregates an already-aggregated table) without finalizing prematurely.

finalizeAggregation, initializeAggregation, runningAccumulate

Three helper functions work with states outside the -Merge flow:

-- Turn a single value into a state
SELECT initializeAggregation('sumState', number) FROM numbers(5);

-- Finalize a state into its value
SELECT finalizeAggregation(arrayReduce('maxState', [1, 2, 3]));   -- 3

-- Accumulate states row by row (ordered, single-threaded)
WITH initializeAggregation('sumState', number) AS state
SELECT number, runningAccumulate(state) AS cumulative
FROM numbers(5);   -- 0, 1, 3, 6, 10

finalizeAggregation is handy when you need the final value but cannot use GROUP BY (it does not merge states across rows). runningAccumulate produces running totals but resets between blocks, so use ordinary window functions for reliable cumulative sums in production. See the cumulative aggregations guide for the window-function approach.

-SimpleState vs -State: Choosing the State Type

-SimpleState returns a SimpleAggregateFunction value rather than a full AggregateFunction. The difference matters for both storage and ergonomics.

A SimpleAggregateFunction is only available for functions whose partial state has the same type as the final result — so merging is a single trivial operation. The supported functions include any, anyLast, min, max, sum, sumWithOverflow, groupBitAnd/groupBitOr/groupBitXor, groupArrayArray, groupUniqArrayArray, sumMap, minMap, and maxMap.

The big ergonomic win: because the stored type already equals the result type, you do not need -Merge to read it. The ClickHouse docs put it directly: values of SimpleAggregateFunction "have the same Type, so unlike with the AggregateFunction type there is no need to apply -Merge/-State combinators."

CREATE TABLE agg_simple
(
    id  UInt64,
    val SimpleAggregateFunction(sum, Double)
)
ENGINE = AggregatingMergeTree
ORDER BY id;

-- Materialized view: just use the value, or -SimpleState explicitly
CREATE MATERIALIZED VIEW agg_simple_mv TO agg_simple AS
SELECT id, sumSimpleState(amount) AS val
FROM source
GROUP BY id;

-- Reading is plain SQL — no sumMerge needed, but still GROUP BY to combine parts
SELECT id, sum(val) AS total FROM agg_simple GROUP BY id;
Aspect -State / AggregateFunction -SimpleState / SimpleAggregateFunction
Reading the column Requires matching -Merge Read directly (still GROUP BY to merge parts)
Supported functions Almost all aggregates (uniq, quantile, argMax, …) Only functions whose state type equals result type
Storage size Larger (full serialized state) Smaller, equals the value type
Best for uniq, percentiles, complex aggregates sum, min, max, any, *Map rollups

Rule of thumb: reach for -SimpleState when the aggregate is sum/min/max/any-style — it's lighter and simpler to query. Use -State when you need uniq, quantiles, argMin/argMax, or any function not in the simple list.

-Resample: Bucketing by a Key

-Resample splits rows into intervals over a resampling key and aggregates each interval independently, returning an array. The parameters are (start, end, step):

SELECT groupArrayResample(30, 75, 30)(name, age) FROM users;
-- buckets ages into [30,60) and [60,75); returns an array of name-arrays per bucket

It is occasionally useful for histogram-style output in one expression, though explicit GROUP BY on a bucketing expression is usually clearer and more flexible.

-OrNull and -OrDefault: Handling Empty Input

By default many aggregates return a type default (often 0) when a group has no qualifying rows, which is ambiguous — you cannot tell "the sum is zero" from "there were no rows." These combinators disambiguate:

SELECT
    sumOrNull(amount)    AS sum_or_null,     -- NULL when no rows, type becomes Nullable
    avgOrDefault(score)  AS avg_or_default    -- type default (0) when no rows
FROM orders
WHERE customer_id = 42;

-OrNull makes the result Nullable and returns NULL for empty input; -OrDefault returns the type's default. They combine with other combinators (e.g. sumIfOrNull).

Combinator Ordering Rules

When you stack combinators, order matters. -Array must come before -If: write uniqArrayIf(arr, cond), where cond is a scalar (not an array). Reversing them changes the meaning or fails to parse. As a general pattern: the data-shaping combinators (-Array, -Map, -ForEach) sit closest to the base name, then filtering (-If), then state/output combinators (-State, -Merge, -OrNull). When in doubt, test the exact expression — ClickHouse will reject invalid combinations.

Best Practices

  1. Match the base function across -State and -Merge. sumState only pairs with sumMerge. The argument types must also match, or you'll hit different aggregate states errors.

  2. Always GROUP BY when reading a state table. AggregatingMergeTree only merges states during background merges, which are not immediate. Reading without -Merge + GROUP BY returns per-part partial states. This is the single most common source of "wrong totals."

  3. Prefer -SimpleState for sum/min/max/any rollups. Smaller states, simpler reads. Save -State for uniq, quantiles, and argMin/argMax.

  4. Use -If instead of multiple passes. Several countIf/sumIf expressions in one query scan the table once; multiple subqueries scan it repeatedly.

  5. Pin the state representation when it matters. Some state encodings (notably uniqExact vs uniqCombined) differ; converting between them is non-trivial — see converting uniqExact to uniqCombined states and the uniq function reference.

Common Issues

  • Reading a -State column returns garbage / binary. You forgot -Merge. AggregateFunction columns must be finalized; SimpleAggregateFunction columns can be read directly but still need GROUP BY to combine parts.
  • Totals are too low or fluctuate. You read the state table without aggregating across parts. Wrap reads in GROUP BY with -Merge (or use FINAL/-Merge consistently).
  • Cannot add different aggregate states on merge. The states being combined come from different base functions or argument types. See the dedicated KB entry.
  • Combinator parse errors. Usually an ordering problem (e.g. -If placed before -Array) or a combinator applied to a function that doesn't support it.

How Pulse Helps

Pulse is a managed monitoring and support service for ClickHouse, Elasticsearch, and other data infrastructure. Combinator-heavy pipelines — AggregatingMergeTree tables fed by materialized views — fail quietly: a state column read without -Merge, a mismatched -State/-Merge pair, or a materialized view that silently stops populating all produce plausible-but-wrong numbers rather than hard errors. Pulse watches materialized view lag, merge backlogs, and query patterns against your aggregation tables, and our ClickHouse engineers help design rollup schemas that pick the right state type and combinator for each metric. Learn more at pulse.support.

Frequently Asked Questions

Q: Why does selecting my AggregateFunction column return unreadable binary?

Because -State columns store an intermediate state, not a value. Apply the matching -Merge combinator (e.g. uniqMerge(col)) inside a GROUP BY to finalize it. SimpleAggregateFunction columns are the exception — they can be read directly.

Q: What's the difference between -State and -SimpleState?

-State produces a full AggregateFunction state that requires -Merge to read and supports nearly every aggregate. -SimpleState produces a smaller SimpleAggregateFunction whose stored type equals the result type, so no -Merge is needed — but it only works for functions like sum, min, max, any, and *Map.

Q: Do I still need GROUP BY if I use -Merge?

Yes. AggregatingMergeTree merges states lazily during background merges, so at query time the same key can still have multiple partial states across parts. GROUP BY with -Merge combines them correctly.

Q: Can I combine multiple combinators on one function?

Yes — sumStateIf, uniqArrayIf, sumIfOrNull are all valid. Order matters: data-shaping combinators like -Array come before -If. Test the exact expression, since not every combination is allowed.

Q: How do I get a final value from a state without GROUP BY?

Use finalizeAggregation(state) for a per-row finalization, or runningAccumulate for a row-by-row running result. Note that neither merges states across arbitrary rows the way GROUP BY ... -Merge does — runningAccumulate also resets between blocks, so prefer window functions for production cumulative sums.

Q: Should I use -If or a WHERE clause?

Use WHERE to filter the whole query; use -If when you need several differently-filtered metrics in one pass (e.g. countIf(a) and countIf(b) side by side). -If avoids re-scanning the table for each condition.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.