Array Functions vs Window Functions: Strategic Query Patterns in ClickHouse

Q: How do I replicate a running total with array functions?

Collect the values with groupArray , sort with arraySort to guarantee order, then apply arrayCumSum . Use arrayCumSumNonNegative if running sums should never drop below zero.

Q: How do I replicate `lag`/`lead` with arrays?

Sort the group into an array, then index by offset: arr[i - 1] is the previous value, arr[i + 1] the next. This avoids the frame-dependent behavior of lagInFrame / leadInFrame , which require an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame to behave like classic lag / lead .

Q: Does `groupArray` preserve row order?

No. groupArray collects values in an unspecified order. Always apply arraySort (typically on a tuple carrying the ordering key) before any order-sensitive array operation.

ClickHouse supports SQL window functions and a rich library of array functions, and the two overlap heavily. Running totals, row-to-row differences, ranking, and "find the longest sequence" problems can all be expressed either with an OVER clause or by collecting rows into an array with groupArray and processing that array in place. This guide explains the trade-offs so you can pick the right pattern for each query rather than reaching for one out of habit.

The short version: window functions keep one output row per input row and read naturally, while array functions collapse a group into a single array you manipulate with higher-order functions. The array approach can be more memory-predictable and is the only option on very old ClickHouse builds, but it changes the shape of your result.

The Two Approaches Side by Side

Consider a "running total of amount per user_id, ordered by ts" requirement. Both of the following compute the same numbers.

Window function:

SELECT
    user_id,
    ts,
    amount,
    sum(amount) OVER (
        PARTITION BY user_id
        ORDER BY ts
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total
FROM payments
ORDER BY user_id, ts;

Array function:

SELECT
    user_id,
    arrayJoin(
        arrayZip(
            arrayMap(x -> x.1, sorted),
            arrayMap(x -> x.2, sorted),
            arrayCumSum(arrayMap(x -> x.2, sorted))
        )
    ) AS row
FROM (
    SELECT
        user_id,
        arraySort(x -> x.1, groupArray(tuple(ts, amount))) AS sorted
    FROM payments
    GROUP BY user_id
)
ORDER BY user_id, row.1;

The window query is shorter and keeps the per-row shape directly. The array query first builds one array per user_id with groupArray(tuple(ts, amount)), sorts it by timestamp with arraySort, computes the cumulative sum with arrayCumSum, and only expands back to rows with arrayJoin if per-row output is needed. If you actually want the result as an array (for example to store a per-user time series), you skip the final arrayJoin entirely — which is where the array approach pulls ahead.

When Each Approach Wins

Dimension	Window functions	Array functions
Output shape	One row per input row, identity preserved	One array per group; expand with `arrayJoin` only if needed
Readability	Standard SQL, familiar `OVER` syntax	Higher-order lambdas; steeper learning curve
Memory model	Buffers each partition to evaluate the frame	Group materialized as an array during aggregation
Result as a collection	Requires re-aggregation (`groupArray` of the window output)	Native — the group already is an array
Ordering control	`ORDER BY` inside `OVER`	Explicit `arraySort` / `arrayReverseSort` on the array
Version availability	Native and on by default in modern ClickHouse	Works on essentially every version
Cross-partition logic	Limited to the defined frame	Full control via `arraySlice`, `arrayFold`, indexing

Reach for window functions when you want one output row per input row, when the query reads more clearly as standard SQL, or when the operation maps cleanly to rank(), row_number(), lagInFrame(), or a framed aggregate. See the rank function guide for the ranking family.

Reach for array functions when the natural output is a per-group collection, when you need ordering or windowing logic that the standard frame syntax does not express, or when you want to avoid re-aggregating window output back into arrays.

Mapping Window Operations to Array Functions

Most window patterns have a direct array equivalent once the group is collected with groupArray and ordered with arraySort.

Window pattern	Array equivalent
`sum(x) OVER (ORDER BY ... ROWS UNBOUNDED PRECEDING)`	`arrayCumSum(arr)`
`x - lagInFrame(x) OVER (ORDER BY ...)` (row-to-row diff)	`arrayDifference(arr)`
`row_number() OVER (ORDER BY ...)`	`arrayEnumerate(arr)`
`lagInFrame(x, n)` / `leadInFrame(x, n)`	index with `arr[i - n]` / `arr[i + n]` after sorting
`first_value(x)` / `last_value(x)`	`arr[1]` / `arr[-1]`
Custom accumulation over a frame	`arrayFold`

A key detail: lag/lead in ClickHouse are exposed as lagInFrame/leadInFrame, and they respect the window frame. To get classic, frame-independent lag/lead behavior you must specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. The array equivalent — sort, then index by offset — sidesteps that frame subtlety entirely.

Running differences

arrayDifference returns 0 for the first element and arr[i] - arr[i-1] thereafter:

SELECT
    sensor_id,
    arrayDifference(
        arrayMap(x -> x.2, arraySort(x -> x.1, groupArray(tuple(ts, reading))))
    ) AS deltas
FROM readings
GROUP BY sensor_id;

This is the array-based replacement for the legacy `runningDifference` function, and unlike runningDifference it is not affected by block boundaries — the entire group is sorted in one array before the difference is taken.

Sequence detection

Problems like "the longest run of consecutive active days per user" are awkward with window functions but natural with arrays. Collect a 0/1 indicator per row, split the array on the zeros, and measure the resulting sub-arrays:

SELECT
    author,
    arrayMax(
        arrayMap(s -> length(s),
            arraySplit(x -> (x = 0), groupArray(is_consecutive))
        )
    ) - 1 AS longest_streak
FROM commits
GROUP BY author;

arraySplit divides the array at each point where the lambda returns true, arrayMap measures each sub-array, and arrayMax takes the longest. There is no concise window-function expression for this.

Performance and Memory Considerations

Both approaches must hold a group in memory to compute results across its rows; neither is free. The practical differences:

Window functions buffer the rows of each partition to evaluate the frame. Wide partitions with many columns can use significant memory, and large GROUP BY-style window output may also spill depending on your settings.
Array functions materialize each group as an array value during aggregation. groupArray memory grows with the number and size of elements per group; on skewed data a few very large groups can dominate memory. See memory limit exceeded during aggregation for diagnosing that case.
Avoid unnecessary arrayJoin. If your final result is a per-group array, do not expand it back to rows just to look like a window query — that re-introduces the row multiplication you avoided. Conversely, if you need per-row output, the trailing arrayJoin adds cost that the window function pays for natively.

In published comparisons of equivalent queries (for example, finding the longest commit streak), the two approaches run in roughly the same time on the same dataset. Treat performance as workload-dependent and measure with EXPLAIN and system.query_log rather than assuming one is universally faster.

Best Practices

Default to window functions for per-row output. If you want one row in, one row out, and the logic is a standard ranking or framed aggregate, the OVER clause is clearer and well optimized in current ClickHouse.
Use array functions when the group is the unit of analysis. Per-user sequences, per-sensor time series, and "store the whole ordered list" requirements are array-native. Building them via window functions forces a re-aggregation step.
Always sort the array explicitly. groupArray does not guarantee element order. Apply arraySort(x -> x.key, arr) before any order-dependent operation like arrayCumSum or arrayDifference. See the groupArray function reference for ordering caveats.
Carry the sort key in a tuple. groupArray(tuple(ts, val)) keeps the ordering column attached so you can sort deterministically, then extract values with arrayMap(x -> x.2, sorted).
Watch group skew. A handful of huge groups can blow up groupArray memory. If group sizes are unbounded, prefer window functions or cap array sizes with groupArray(N)(x).
Expand back to rows only when required. Use arrayJoin (or ARRAY JOIN) at the very end, after all array processing, to minimize the number of rows flowing through the pipeline.

Common Issues

Unordered results. Forgetting arraySort before arrayCumSum/arrayDifference yields correct-looking but wrong numbers, because groupArray collects values in an unspecified order, not value order.
Frame surprises with lagInFrame. Because lagInFrame/leadInFrame respect the frame, omitting an explicit unbounded frame can return unexpected values. The array equivalent (sort then index) has no such ambiguity.
Re-aggregating window output. Wrapping a window function in groupArray to get a per-group array works but does the work twice. If the end goal is an array, build it with array functions directly.
Memory from tuples. groupArray(tuple(...)) stores every column you include for every row in the group. Include only the columns you actually need downstream.

How Pulse Helps

Pulse monitors ClickHouse query performance and memory usage, and flags queries where window functions buffer large partitions or where groupArray-based aggregation produces oversized arrays on skewed groups. When a per-group analytics query causes memory pressure, Pulse surfaces the offending query, the group cardinality and array sizes involved, and whether an array-function rewrite (or the reverse) would reduce resource usage. That makes the strategic choice between the two patterns data-driven rather than a guess.

Frequently Asked Questions

Q: Are array functions faster than window functions in ClickHouse?

Not inherently. For equivalent problems the two often run in comparable time. The right choice depends on output shape and memory profile, not a blanket speed advantage. Measure both with EXPLAIN and system.query_log.

Q: Why would I use array functions if ClickHouse has native window functions?

Three main reasons: the natural result is a per-group array (so window functions would require an extra re-aggregation), you need ordering or windowing logic the standard frame syntax cannot express, or you are running a ClickHouse version old enough that native window functions are unavailable.

Q: How do I replicate a running total with array functions?

Collect the values with groupArray, sort with arraySort to guarantee order, then apply arrayCumSum. Use arrayCumSumNonNegative if running sums should never drop below zero.

Q: How do I replicate lag/lead with arrays?

Sort the group into an array, then index by offset: arr[i - 1] is the previous value, arr[i + 1] the next. This avoids the frame-dependent behavior of lagInFrame/leadInFrame, which require an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame to behave like classic lag/lead.

Q: Does groupArray preserve row order?

No. groupArray collects values in an unspecified order. Always apply arraySort (typically on a tuple carrying the ordering key) before any order-sensitive array operation.

Q: When should I expand the array back to rows?

Only when your final output must be one row per element. Do it last, with arrayJoin or ARRAY JOIN, after all array processing — expanding early multiplies rows and discards the efficiency of the array approach.