AggregatingMergeTree in ClickHouse: Pre-Aggregated Storage Explained

What is AggregatingMergeTree?

AggregatingMergeTree is a specialized ClickHouse MergeTree-family engine that replaces rows sharing the same sorting key with a single row holding aggregate function states. Instead of storing raw values, columns are typed as AggregateFunction(name, type) or SimpleAggregateFunction(name, type), and the engine combines states during background merges. The pattern is used to maintain pre-aggregated rollups - hourly counters, distinct-user sketches, quantile digests - that are cheap to read because most of the aggregation work has already happened at write time. It is most effective when the rollup reduces row counts by orders of magnitude.

How AggregatingMergeTree Works

Each aggregate column stores a partial state (not the final value). You write states with -State combinator functions (sumState, uniqState, quantileState) and read them by combining stored states with -Merge combinator functions (sumMerge, uniqMerge, quantileMerge). During background merges, ClickHouse calls the aggregate function's merge() method to combine same-key states, so the on-disk size shrinks over time.

-- Destination rollup table: hourly per-tenant counters and distinct users
CREATE TABLE events_hourly_agg
(
    tenant_id    UInt32,
    event_hour   DateTime,
    events_count AggregateFunction(sum, UInt64),
    unique_users AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(event_hour)
ORDER BY (tenant_id, event_hour);

-- Materialized view that feeds the rollup as rows arrive in events_raw
CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly_agg AS
SELECT
    tenant_id,
    toStartOfHour(event_time)         AS event_hour,
    sumState(1::UInt64)               AS events_count,
    uniqState(user_id)                AS unique_users
FROM events_raw
GROUP BY tenant_id, event_hour;

-- Reader: combine remaining partial states at query time
SELECT
    tenant_id,
    event_hour,
    sumMerge(events_count)  AS events,
    uniqMerge(unique_users) AS uniques
FROM events_hourly_agg
WHERE event_hour >= now() - INTERVAL 7 DAY
GROUP BY tenant_id, event_hour
ORDER BY event_hour;

The GROUP BY on the read side is required because merges are asynchronous - multiple parts may still hold partial states for the same key. Using FINAL forces a synchronous merge at read time but is significantly slower.

Engine What it stores Read-time work Best for
MergeTree Raw rows Full aggregation Wide-column raw event storage
SummingMergeTree Summed numeric columns per key Sum the per-part residues Simple sum() rollups only
AggregatingMergeTree Partial aggregate states per key -Merge combinator Arbitrary aggregate functions (uniq, quantile, argMax, etc.)
ReplacingMergeTree Latest row per key by version FINAL or argMax Upsert / CDC patterns
CollapsingMergeTree Sign-encoded rows SUM(sign * col) Incremental updates with insert/delete pairs

Choose AggregatingMergeTree when SummingMergeTree is not expressive enough (you need uniq, quantile, argMax, groupArray, etc.) and you can tolerate the operational cost of explicit -State / -Merge plumbing.

Common Pitfalls with AggregatingMergeTree

  1. Mixing raw values and aggregate states in the same column - the engine only merges AggregateFunction / SimpleAggregateFunction columns. Plain UInt64 columns are silently kept "as is" only with SimpleAggregateFunction(any, ...) semantics; otherwise they are left to the engine's default behavior.
  2. Forgetting GROUP BY on the read side. Without it you get one row per part per key, not a final aggregate.
  3. Using sum() instead of sumMerge() on an AggregateFunction(sum, ...) column - returns binary state bytes interpreted as a number, producing garbage values.
  4. Aggregating on columns not in ORDER BY. Only columns in the sorting key act as the merge key; everything else accumulates raw.
  5. Over-using FINAL in production queries. It materializes the merge at read time and disables several optimizations; prefer GROUP BY with -Merge combinators.

Monitoring AggregatingMergeTree Tables

Watch the part count and merge depth to confirm the engine is doing its job:

-- Active parts per partition; high counts suggest merges are falling behind
SELECT partition, count() AS parts, sum(rows) AS rows
FROM system.parts
WHERE table = 'events_hourly_agg' AND active = 1
GROUP BY partition ORDER BY parts DESC;

-- Ongoing merges
SELECT table, elapsed, progress, num_parts, source_part_names
FROM system.merges WHERE table = 'events_hourly_agg';

If parts pile up faster than merges can collapse them, you'll eventually hit the parts_to_throw_insert threshold (default 3000) and see a Too Many Parts error. Pulse tracks AggregatingMergeTree merge backlogs across replicas and surfaces the root cause when rollups stop converging - typically slow disk on one replica, an unbalanced partition key, or a runaway materialized view producing too many small inserts.

Frequently Asked Questions

Q: What is the difference between AggregatingMergeTree and SummingMergeTree?
A: SummingMergeTree only sums numeric columns per primary key during merges. AggregatingMergeTree stores arbitrary aggregate function states (uniq, quantile, argMax, groupArray, sum, etc.) and merges them with the function's own merge method. AggregatingMergeTree is strictly more general; SummingMergeTree is simpler when you only need sums.

Q: Do I need to use FINAL when querying AggregatingMergeTree?
A: No - the recommended pattern is to add an explicit GROUP BY with -Merge combinator functions on the read side, which combines the partial states ClickHouse has not yet merged in the background. FINAL works but materializes the merge at read time and is typically slower than the GROUP BY approach.

Q: Which aggregate functions work with AggregatingMergeTree?
A: Any aggregate function that has a -State and -Merge combinator works - this covers nearly every ClickHouse aggregate including sum, count, avg, uniq, uniqExact, quantile, argMax, argMin, groupArray, groupUniqArray, topK, and min/max. Use SimpleAggregateFunction(sum, UInt64) for sums when you want to read the column as a plain number without -Merge.

Q: How do I feed an AggregatingMergeTree table?
A: The canonical pattern is a materialized view: CREATE MATERIALIZED VIEW mv TO agg_table AS SELECT key, sumState(value) FROM raw GROUP BY key. The MV runs the -State aggregation per insert into raw and routes the partial states into the aggregating table.

Q: When should I avoid AggregatingMergeTree?
A: Avoid it when the aggregation does not reduce row count meaningfully (the engine's overhead outweighs the savings), when you need raw rows for ad-hoc analysis, or when only simple sums are needed (SummingMergeTree is operationally simpler). Also avoid it for upsert/CDC workloads, where ReplacingMergeTree fits better.

Q: Can I replicate an AggregatingMergeTree table?
A: Yes - use ReplicatedAggregatingMergeTree('/clickhouse/tables/{shard}/agg_table', '{replica}'). It combines the aggregation semantics with ZooKeeper / ClickHouse Keeper-coordinated replication described in ReplicatedMergeTree.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.