ClickHouse Cumulative Aggregations: Running Sums and Window Functions

Cumulative aggregations — running sums, running counts, moving averages, and cumulative statistics — answer questions like "what is the total revenue up to each day?" or "what is the 7-day rolling average?". In ClickHouse the canonical, correct way to express these is window functions with an OVER clause, though several older array and running* functions exist for specific cases.

This guide covers the window-function patterns for cumulative work, when the alternatives (arrayCumSum, groupArrayMovingSum, runningAccumulate) make sense, and the frame and ordering gotchas that produce wrong results.

Running Sum with Window Functions

The fundamental building block is an aggregate function with an OVER clause that has an ORDER BY. As rows are processed in order, the window grows to include every row from the start of the partition up to the current row:

SELECT
    sale_date,
    revenue,
    sum(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM sales
ORDER BY sale_date;

The frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is what makes this cumulative: every row sees itself plus all preceding rows in the order defined.

Window functions are a stable, default-enabled feature in modern ClickHouse — you do not need to set any experimental flag in current (24.x / 25.x) releases. See the sum() function reference for the aggregate itself.

Resetting the Running Total per Group

Add PARTITION BY to restart the accumulation for each group. This is the most common requirement — cumulative spend per user, cumulative sales per region:

SELECT
    user_id,
    event_time,
    amount,
    sum(amount) OVER (
        PARTITION BY user_id
        ORDER BY event_time
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS user_running_spend
FROM purchases
ORDER BY user_id, event_time;

Each user_id accumulates independently and resets at the partition boundary.

ROWS vs RANGE: The Ties Gotcha

If you omit the frame clause but keep ORDER BY, ClickHouse applies the SQL-standard default frame of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROWnot ROWS. The two behave identically only when the ORDER BY values are unique. When there are ties (multiple rows sharing the same ORDER BY value), they diverge:

Frame Behavior on tied ORDER BY values
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Counts rows physically. Each tied row gets a distinct, incrementing running total based on its position.
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (the default) Treats tied rows as peers. All rows sharing an ORDER BY value get the same running total — the total through the end of that value's group.

For a true row-by-row running total, always specify ROWS explicitly. Relying on the implicit default can silently produce "stepped" results when your sort key has duplicates (common with date-only ordering on intraday data).

-- Explicit ROWS: distinct running total per row, even on tied dates
sum(amount) OVER (ORDER BY sale_date ROWS UNBOUNDED PRECEDING)

ROWS UNBOUNDED PRECEDING is shorthand for ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Moving (Rolling) Aggregations

A bounded ROWS BETWEEN N PRECEDING AND CURRENT ROW frame gives a moving window of the current row plus the previous N rows — the basis for rolling sums and moving averages:

SELECT
    ts,
    value,
    avg(value) OVER (
        ORDER BY ts
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7row_avg,
    sum(value) OVER (
        ORDER BY ts
        ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    ) AS rolling_7row_sum
FROM metrics
ORDER BY ts;

Note this is a fixed number of rows, not a time interval. To average over a calendar window (e.g. exactly 7 days regardless of how many rows fall in them), pre-aggregate to one row per day first, then apply the row-based frame — or use a RANGE frame with an interval if your ClickHouse version supports it for your type.

Other Cumulative Functions: avg, count, max, min

Any aggregate works inside the cumulative frame, so you can compute several running statistics in one pass:

SELECT
    sale_date,
    revenue,
    sum(revenue)   OVER w AS running_total,
    count()        OVER w AS running_count,
    avg(revenue)   OVER w AS running_avg,
    max(revenue)   OVER w AS running_max,
    min(revenue)   OVER w AS running_min
FROM sales
WINDOW w AS (ORDER BY sale_date ROWS UNBOUNDED PRECEDING)
ORDER BY sale_date;

The WINDOW clause names a frame once and reuses it across columns, keeping the query readable. Ranking functions like rank() and dense_rank(), and offset functions like lagInFrame() and leadInFrame(), use the same OVER machinery and pair naturally with cumulative columns.

Cumulative Sum Over Pre-Aggregated Data

A frequent pattern is "cumulative daily totals": first sum within each day, then accumulate those daily sums. Nest the aggregate inside the window function:

SELECT
    day,
    daily_total,
    sum(daily_total) OVER (
        ORDER BY day ROWS UNBOUNDED PRECEDING
    ) AS cumulative_total
FROM
(
    SELECT
        toDate(event_time) AS day,
        sum(amount)        AS daily_total
    FROM events
    GROUP BY day
)
ORDER BY day;

Alternatives to Window Functions

Window functions are the recommended default. These older constructs remain useful in narrow cases:

Function Use when Caveat
arrayCumSum(arr) You already have data as an array (e.g. from groupArray) and want prefix sums. Also decodes delta-encoded arrays. Operates on an array, not rows. arrayCumSumNonNegative clamps the running sum to 0 when it would go negative.
groupArrayMovingSum(x) / groupArrayMovingAvg(x) You want a moving sum/avg returned as an array; optional (window_size) parameter. Aggregate over a scalar numeric column that returns an array of the same length; not a per-row scalar.
runningAccumulate(state) Very large scans where you accumulate an aggregate state (e.g. sumState()). Deprecated and disabled by default. Requires allow_deprecated_error_prone_window_functions = 1. Order depends on block processing order, not logical row order — results are not deterministic across runs.
runningDifference() Difference between successive rows (deltas, not cumulative). Deprecated and disabled by default. Requires allow_deprecated_error_prone_window_functions = 1. Same block-ordering caveat as runningAccumulate.

arrayCumSum example

SELECT arrayCumSum([1, 2, 3, 4]) AS cumulative;
-- [1, 3, 6, 10]

Why runningAccumulate is risky

runningAccumulate(sumState(x)) looks like a cheap running total, but ClickHouse's documentation is explicit that "the order of the result depends on the order of the blocks being processed." Two executions of the same query — or the same query after a merge — can return different accumulations. Window functions carry no such caveat: their ORDER BY defines a deterministic order. Prefer window functions unless you have measured a specific performance problem.

Best Practices

  1. Always write the frame explicitly. Use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (or ROWS UNBOUNDED PRECEDING) for running totals. Don't rely on the implicit RANGE default unless you specifically want peer semantics on ties.

  2. Order on a unique, monotonic key when possible. A unique ORDER BY (e.g. a timestamp plus a tiebreaker like an id) makes ROWS and RANGE agree and keeps results reproducible.

  3. Prefer window functions over runningAccumulate. They are deterministic and clearer. Reach for runningAccumulate / array functions only for measured edge cases.

  4. Partition to reset. Use PARTITION BY whenever the cumulative value should restart per entity; forgetting it silently bleeds totals across groups.

  5. Name reusable frames with WINDOW. When several columns share a frame, define it once to avoid copy-paste drift.

  6. Pre-aggregate before accumulating when you want cumulative totals over coarser buckets (per day, per hour) — sum within the bucket in a subquery, then run the window over the buckets.

Common Issues

  • "My running total has duplicated steps." You used the default frame (RANGE) with a sort key that has ties. Switch to an explicit ROWS frame.
  • "Results change between runs." You're using runningAccumulate or runningDifference, which depend on block order. Move to a window function with an explicit ORDER BY.
  • "The moving average covers the wrong time span." ROWS BETWEEN N PRECEDING counts rows, not time. Pre-aggregate to one row per time bucket, or use a RANGE interval frame.
  • "Totals leak across groups." Add PARTITION BY so the accumulation resets per group.
  • Memory pressure on huge partitions. A single unbounded partition forces ClickHouse to process it in order in memory. Partition the window or pre-aggregate to reduce row counts.

How Pulse Helps

Cumulative queries are easy to write but easy to get subtly wrong — a missing ROWS frame or a stray runningAccumulate can ship dashboards with numbers that look plausible but are inconsistent. Pulse provides ClickHouse expertise and ongoing operational support: reviewing query patterns for correctness and performance, catching non-deterministic constructs before they reach production, and tuning the underlying table design (sort keys, partitioning, projections) so that large running aggregations stay fast. If your team relies on time-series rollups and running totals, Pulse helps make sure they are both correct and efficient.

Frequently Asked Questions

Q: What is the difference between ROWS UNBOUNDED PRECEDING and the default frame?

ROWS UNBOUNDED PRECEDING counts physical rows, so every row gets a distinct running total. The default frame when you only specify ORDER BY is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, which treats rows with equal ORDER BY values as peers and gives them the same total. They match only when the sort key is unique.

Q: How do I make a cumulative sum reset for each group?

Add PARTITION BY <group_column> to the OVER clause. The accumulation restarts at each partition boundary, so each group accumulates independently.

Q: Should I use runningAccumulate for running totals?

Generally no. Its result order depends on how data blocks are processed, so it can return different values across runs or after merges. A sum(...) OVER (ORDER BY ...) window function is deterministic and is the recommended approach.

Q: How do I compute a moving average over the last N rows?

Use a bounded frame: avg(value) OVER (ORDER BY ts ROWS BETWEEN N-1 PRECEDING AND CURRENT ROW). For a 7-row average use 6 PRECEDING. This counts rows, not a time interval.

Q: Can I compute a cumulative count of distinct values?

countDistinct is not a window function, so a direct OVER won't give a running distinct count. Common approaches are accumulating an uniqExactState/uniqState or restructuring with arrays; the Altinity Knowledge Base "Cumulative Anything" article documents patterns for this. Validate results carefully, as approximate uniq* states accumulate approximation error.

Q: What's the array-based alternative for a cumulative sum?

arrayCumSum(arr) returns prefix sums of an array, and groupArrayMovingSum returns a moving sum as an array. These are handy when your data is already arranged as arrays (for example via groupArray), but for row-oriented results window functions are clearer.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.