arrayFold is ClickHouse's general-purpose array reduction operator. It walks an array from left to right, applying a lambda that receives an accumulator plus the current element(s), and returns the final accumulator value. If you know Array.reduce from JavaScript or foldLeft from functional languages, arrayFold is the direct equivalent.
It is the most flexible member of the array-function family: where arrayReduce only applies named aggregate functions and arrayCumSum only sums, arrayFold lets you express any left-to-right computation that depends on previously seen elements — running products, recurrence relations, smoothing, or building up a new array. It was introduced in ClickHouse v23.10.
Syntax
arrayFold(lambda(acc, x1 [, x2, ... xN]), arr1 [, arr2, ... arrN], accumulator)
lambda(acc, x1 [, ...])— the folding function. Its first argument is always the accumulator; the remaining arguments are the current element from each input array.arr1 [, arr2, ...]— one or more arrays of equal length. Corresponding elements are fed into the lambda together.accumulator— the initial accumulator value. Cast it explicitly (e.g.0::Int64) so its type matches what the lambda returns.
The argument order is the part people get wrong: the lambda comes first, the arrays in the middle, and the initial accumulator value comes last — not first as in many other languages.
Basic Examples
Summing an array (the canonical "hello world" of fold):
SELECT arrayFold((acc, x) -> acc + x, [10, 20, 30], 0::UInt64) AS sum;
-- 60
A non-trivial reduction that doubles each element before adding:
SELECT arrayFold((acc, x) -> acc + x * 2, [1, 2, 3, 4], 3::Int64) AS res;
-- 23 (3 + 2 + 4 + 6 + 8)
Folding Over Multiple Arrays
When you pass several arrays, the lambda gains one element argument per array, and ClickHouse zips them element-by-element. This is how you express a dot product:
SELECT arrayFold(
(acc, x, y) -> acc + x * y,
[1, 2, 3, 4],
[10, 20, 30, 40],
0::Int64
) AS res;
-- 300
All input arrays must have the same length, or ClickHouse raises an error. See Sizes of arrays don't match for that specific failure.
Building a Running Result
Because the accumulator can be any type — including an Array — arrayFold is not limited to producing a single scalar. By accumulating into an array and reading back its last element, you can compute running calculations that genuinely depend on prior results.
A classic case is an exponentially weighted moving average (EWMA), where each smoothed value is alpha * x + (1 - alpha) * previous_smoothed:
WITH
[40, 45, 43, 31, 20] AS data,
0.3 AS alpha
SELECT arrayFold(
(acc, x) -> arrayPushBack(acc, alpha * x + (1 - alpha) * acc[-1]),
arrayPopFront(data),
[CAST(data[1], 'Float64')]
) AS ewma;
-- [40, 41.5, 41.95, 38.665, 33.0655]
Here the accumulator is an array seeded with the first data point. Each step appends a new smoothed value computed from the prior one (acc[-1]). This kind of stateful recurrence is exactly what plain arrayMap and arrayCumSum cannot do, since they have no access to previously computed output.
arrayFold vs. Related Functions
ClickHouse offers several ways to collapse an array. Pick the most specific tool that does the job — arrayFold is the most powerful but also the slowest to write and reason about.
| Function | What it does | Use when |
|---|---|---|
| arrayFold | General left-to-right fold with a custom lambda and accumulator | You need arbitrary state carried across elements (recurrences, smoothing, custom reductions) |
| arrayReduce | Applies a named aggregate function ('sum', 'max', 'uniq', …) passed as a string |
A standard aggregate already expresses what you want |
| arrayReduceInRanges | Like arrayReduce but over specified (offset, length) ranges, returning an array |
You need windowed aggregates within one array |
| arrayCumSum / arrayCumSumNonNegative | Running totals | You only need a cumulative sum |
| arrayMap | Transforms each element independently | No cross-element state is needed |
For example, these two produce the same total, but the first is clearer and faster:
SELECT arrayReduce('sum', [10, 20, 30]); -- preferred for plain sum
SELECT arrayFold((acc, x) -> acc + x, [10, 20, 30], 0::UInt64); -- only if you need custom logic
Reach for arrayFold when no built-in aggregate or cumulative function captures your logic.
Best Practices
Cast the initial accumulator explicitly. A bare literal like
0isUInt8and can overflow or fail type inference. Write0::Int64,0::Float64, orCAST(..., 'Float64')so the accumulator type matches the lambda's return type.Keep the lambda return type stable. The value the lambda returns becomes the next accumulator, so every iteration must return the same type as the initial accumulator. Mixing types (e.g. returning
Float64from anInt64seed) causes errors.Prefer specialized functions when they fit.
arrayReduce,arrayCumSum,arraySum, and window functions are simpler and usually faster. Only usearrayFoldwhen you genuinely need cross-element state.Mind per-row array size.
arrayFoldruns entirely within a single row over that row's array. Very large arrays per row mean significant per-row CPU and memory; this is array processing, not distributed aggregation.Test type behavior on small inputs first. Fold logic with array-typed accumulators (like the EWMA pattern) is easy to get subtly wrong; validate against a handful of known values before running at scale.
Common Issues
- Wrong argument order. Putting the accumulator first (as in some languages) fails — in ClickHouse the lambda is first and the initial accumulator is the last argument.
- Mismatched array lengths. Multiple input arrays must be the same length; otherwise you get a "Sizes of arrays don't match" error. See the dedicated guide.
- Accumulator type drift. If the lambda's output type doesn't match the seed type on every iteration, ClickHouse throws a type error. Cast the seed and keep returns consistent.
- Empty arrays. Folding an empty array returns the initial accumulator unchanged — useful, but make sure your seed is a valid "neutral" value for your computation.
How Pulse Helps
Functions like arrayFold push heavy, stateful computation into a single row, which can quietly become a per-query CPU and memory hotspot at scale. Pulse monitors ClickHouse query performance and resource usage, surfacing slow array-heavy queries, memory-limit events, and the queries driving them so you can decide whether a fold belongs in SQL or upstream in your pipeline. Pulse is run by ClickHouse, Elasticsearch, and OpenSearch experts who can advise on rewriting expensive array logic into more efficient aggregate or window-function form.
Frequently Asked Questions
Q: What ClickHouse version introduced arrayFold?
arrayFold was added in ClickHouse v23.10. On older versions you must approximate it with arrayReduce, arrayCumSum, or recursive CTE-style workarounds.
Q: How is arrayFold different from arrayReduce?
arrayReduce applies a named aggregate function (passed as a string, e.g. 'sum' or 'max') and gives you no control over intermediate state. arrayFold runs a custom lambda with an explicit accumulator, so it can express recurrences and reductions that no single aggregate covers. Use arrayReduce when a standard aggregate fits; use arrayFold when it doesn't.
Q: Can arrayFold return an array instead of a scalar?
Yes. The accumulator can be any type, including Array. Seeding the accumulator with an array and appending to it (e.g. with arrayPushBack) is the standard pattern for running calculations like EWMA or cumulative products.
Q: Can I fold over more than one array at once?
Yes. Pass multiple arrays after the lambda, and give the lambda one element argument per array: arrayFold((acc, x, y) -> ..., arr1, arr2, seed). The arrays must all be the same length. This is how you compute dot products or element-wise combined reductions.
Q: Why does my arrayFold query throw a type error?
The most common cause is the lambda returning a different type than the initial accumulator. Every iteration's result becomes the next accumulator, so the types must match. Cast the seed explicitly (0::Float64) and ensure the lambda always returns that same type.
Q: Is arrayFold or a window function better for running calculations?
It depends on the data layout. If your sequence is already collapsed into one array per row, arrayFold keeps the computation inside that row. If the data is stored as separate rows, window functions or cumulative aggregations are usually a better fit. Avoid building giant per-row arrays solely to fold over them.