ClickHouse supports SQL window functions and a rich library of array functions, and the two overlap heavily. Running totals, row-to-row differences, ranking, and "find the longest sequence" problems can all be expressed either with an OVER clause or by collecting rows into an array with groupArray and processing that array in place. This guide explains the trade-offs so you can pick the right pattern for each query rather than reaching for one out of habit.
The short version: window functions keep one output row per input row and read naturally, while array functions collapse a group into a single array you manipulate with higher-order functions. The array approach can be more memory-predictable and is the only option on very old ClickHouse builds, but it changes the shape of your result.
The Two Approaches Side by Side
Consider a "running total of amount per user_id, ordered by ts" requirement. Both of the following compute the same numbers.
Window function:
SELECT
user_id,
ts,
amount,
sum(amount) OVER (
PARTITION BY user_id
ORDER BY ts
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM payments
ORDER BY user_id, ts;
Array function:
SELECT
user_id,
arrayJoin(
arrayZip(
arrayMap(x -> x.1, sorted),
arrayMap(x -> x.2, sorted),
arrayCumSum(arrayMap(x -> x.2, sorted))
)
) AS row
FROM (
SELECT
user_id,
arraySort(x -> x.1, groupArray(tuple(ts, amount))) AS sorted
FROM payments
GROUP BY user_id
)
ORDER BY user_id, row.1;
The window query is shorter and keeps the per-row shape directly. The array query first builds one array per user_id with groupArray(tuple(ts, amount)), sorts it by timestamp with arraySort, computes the cumulative sum with arrayCumSum, and only expands back to rows with arrayJoin if per-row output is needed. If you actually want the result as an array (for example to store a per-user time series), you skip the final arrayJoin entirely — which is where the array approach pulls ahead.
When Each Approach Wins
| Dimension | Window functions | Array functions |
|---|---|---|
| Output shape | One row per input row, identity preserved | One array per group; expand with arrayJoin only if needed |
| Readability | Standard SQL, familiar OVER syntax |
Higher-order lambdas; steeper learning curve |
| Memory model | Buffers each partition to evaluate the frame | Group materialized as an array during aggregation |
| Result as a collection | Requires re-aggregation (groupArray of the window output) |
Native — the group already is an array |
| Ordering control | ORDER BY inside OVER |
Explicit arraySort / arrayReverseSort on the array |
| Version availability | Native and on by default in modern ClickHouse | Works on essentially every version |
| Cross-partition logic | Limited to the defined frame | Full control via arraySlice, arrayFold, indexing |
Reach for window functions when you want one output row per input row, when the query reads more clearly as standard SQL, or when the operation maps cleanly to rank(), row_number(), lagInFrame(), or a framed aggregate. See the rank function guide for the ranking family.
Reach for array functions when the natural output is a per-group collection, when you need ordering or windowing logic that the standard frame syntax does not express, or when you want to avoid re-aggregating window output back into arrays.
Mapping Window Operations to Array Functions
Most window patterns have a direct array equivalent once the group is collected with groupArray and ordered with arraySort.
| Window pattern | Array equivalent |
|---|---|
sum(x) OVER (ORDER BY ... ROWS UNBOUNDED PRECEDING) |
arrayCumSum(arr) |
x - lagInFrame(x) OVER (ORDER BY ...) (row-to-row diff) |
arrayDifference(arr) |
row_number() OVER (ORDER BY ...) |
arrayEnumerate(arr) |
lagInFrame(x, n) / leadInFrame(x, n) |
index with arr[i - n] / arr[i + n] after sorting |
first_value(x) / last_value(x) |
arr[1] / arr[-1] |
| Custom accumulation over a frame | arrayFold |
A key detail: lag/lead in ClickHouse are exposed as lagInFrame/leadInFrame, and they respect the window frame. To get classic, frame-independent lag/lead behavior you must specify ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. The array equivalent — sort, then index by offset — sidesteps that frame subtlety entirely.
Running differences
arrayDifference returns 0 for the first element and arr[i] - arr[i-1] thereafter:
SELECT
sensor_id,
arrayDifference(
arrayMap(x -> x.2, arraySort(x -> x.1, groupArray(tuple(ts, reading))))
) AS deltas
FROM readings
GROUP BY sensor_id;
This is the array-based replacement for the legacy `runningDifference` function, and unlike runningDifference it is not affected by block boundaries — the entire group is sorted in one array before the difference is taken.
Sequence detection
Problems like "the longest run of consecutive active days per user" are awkward with window functions but natural with arrays. Collect a 0/1 indicator per row, split the array on the zeros, and measure the resulting sub-arrays:
SELECT
author,
arrayMax(
arrayMap(s -> length(s),
arraySplit(x -> (x = 0), groupArray(is_consecutive))
)
) - 1 AS longest_streak
FROM commits
GROUP BY author;
arraySplit divides the array at each point where the lambda returns true, arrayMap measures each sub-array, and arrayMax takes the longest. There is no concise window-function expression for this.
Performance and Memory Considerations
Both approaches must hold a group in memory to compute results across its rows; neither is free. The practical differences:
- Window functions buffer the rows of each partition to evaluate the frame. Wide partitions with many columns can use significant memory, and large
GROUP BY-style window output may also spill depending on your settings. - Array functions materialize each group as an array value during aggregation.
groupArraymemory grows with the number and size of elements per group; on skewed data a few very large groups can dominate memory. See memory limit exceeded during aggregation for diagnosing that case. - Avoid unnecessary
arrayJoin. If your final result is a per-group array, do not expand it back to rows just to look like a window query — that re-introduces the row multiplication you avoided. Conversely, if you need per-row output, the trailingarrayJoinadds cost that the window function pays for natively.
In published comparisons of equivalent queries (for example, finding the longest commit streak), the two approaches run in roughly the same time on the same dataset. Treat performance as workload-dependent and measure with EXPLAIN and system.query_log rather than assuming one is universally faster.
Best Practices
Default to window functions for per-row output. If you want one row in, one row out, and the logic is a standard ranking or framed aggregate, the
OVERclause is clearer and well optimized in current ClickHouse.Use array functions when the group is the unit of analysis. Per-user sequences, per-sensor time series, and "store the whole ordered list" requirements are array-native. Building them via window functions forces a re-aggregation step.
Always sort the array explicitly.
groupArraydoes not guarantee element order. ApplyarraySort(x -> x.key, arr)before any order-dependent operation likearrayCumSumorarrayDifference. See the groupArray function reference for ordering caveats.Carry the sort key in a tuple.
groupArray(tuple(ts, val))keeps the ordering column attached so you can sort deterministically, then extract values witharrayMap(x -> x.2, sorted).Watch group skew. A handful of huge groups can blow up
groupArraymemory. If group sizes are unbounded, prefer window functions or cap array sizes withgroupArray(N)(x).Expand back to rows only when required. Use
arrayJoin(or ARRAY JOIN) at the very end, after all array processing, to minimize the number of rows flowing through the pipeline.
Common Issues
- Unordered results. Forgetting
arraySortbeforearrayCumSum/arrayDifferenceyields correct-looking but wrong numbers, becausegroupArraycollects values in an unspecified order, not value order. - Frame surprises with
lagInFrame. BecauselagInFrame/leadInFramerespect the frame, omitting an explicit unbounded frame can return unexpected values. The array equivalent (sort then index) has no such ambiguity. - Re-aggregating window output. Wrapping a window function in
groupArrayto get a per-group array works but does the work twice. If the end goal is an array, build it with array functions directly. - Memory from tuples.
groupArray(tuple(...))stores every column you include for every row in the group. Include only the columns you actually need downstream.
How Pulse Helps
Pulse monitors ClickHouse query performance and memory usage, and flags queries where window functions buffer large partitions or where groupArray-based aggregation produces oversized arrays on skewed groups. When a per-group analytics query causes memory pressure, Pulse surfaces the offending query, the group cardinality and array sizes involved, and whether an array-function rewrite (or the reverse) would reduce resource usage. That makes the strategic choice between the two patterns data-driven rather than a guess.
Frequently Asked Questions
Q: Are array functions faster than window functions in ClickHouse?
Not inherently. For equivalent problems the two often run in comparable time. The right choice depends on output shape and memory profile, not a blanket speed advantage. Measure both with EXPLAIN and system.query_log.
Q: Why would I use array functions if ClickHouse has native window functions?
Three main reasons: the natural result is a per-group array (so window functions would require an extra re-aggregation), you need ordering or windowing logic the standard frame syntax cannot express, or you are running a ClickHouse version old enough that native window functions are unavailable.
Q: How do I replicate a running total with array functions?
Collect the values with groupArray, sort with arraySort to guarantee order, then apply arrayCumSum. Use arrayCumSumNonNegative if running sums should never drop below zero.
Q: How do I replicate lag/lead with arrays?
Sort the group into an array, then index by offset: arr[i - 1] is the previous value, arr[i + 1] the next. This avoids the frame-dependent behavior of lagInFrame/leadInFrame, which require an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING frame to behave like classic lag/lead.
Q: Does groupArray preserve row order?
No. groupArray collects values in an unspecified order. Always apply arraySort (typically on a tuple carrying the ordering key) before any order-sensitive array operation.
Q: When should I expand the array back to rows?
Only when your final output must be one row per element. Do it last, with arrayJoin or ARRAY JOIN, after all array processing — expanding early multiplies rows and discards the efficiency of the array approach.