ClickHouse arrayFold Function: Functional Array Reduction

arrayFold is ClickHouse's general-purpose array reduction operator. It walks an array from left to right, applying a lambda that receives an accumulator plus the current element(s), and returns the final accumulator value. If you know Array.reduce from JavaScript or foldLeft from functional languages, arrayFold is the direct equivalent.

It is the most flexible member of the array-function family: where arrayReduce only applies named aggregate functions and arrayCumSum only sums, arrayFold lets you express any left-to-right computation that depends on previously seen elements — running products, recurrence relations, smoothing, or building up a new array. It was introduced in ClickHouse v23.10.

Syntax

arrayFold(lambda(acc, x1 [, x2, ... xN]), arr1 [, arr2, ... arrN], accumulator)
  • lambda(acc, x1 [, ...]) — the folding function. Its first argument is always the accumulator; the remaining arguments are the current element from each input array.
  • arr1 [, arr2, ...] — one or more arrays of equal length. Corresponding elements are fed into the lambda together.
  • accumulator — the initial accumulator value. Cast it explicitly (e.g. 0::Int64) so its type matches what the lambda returns.

The argument order is the part people get wrong: the lambda comes first, the arrays in the middle, and the initial accumulator value comes last — not first as in many other languages.

Basic Examples

Summing an array (the canonical "hello world" of fold):

SELECT arrayFold((acc, x) -> acc + x, [10, 20, 30], 0::UInt64) AS sum;
-- 60

A non-trivial reduction that doubles each element before adding:

SELECT arrayFold((acc, x) -> acc + x * 2, [1, 2, 3, 4], 3::Int64) AS res;
-- 23   (3 + 2 + 4 + 6 + 8)

Folding Over Multiple Arrays

When you pass several arrays, the lambda gains one element argument per array, and ClickHouse zips them element-by-element. This is how you express a dot product:

SELECT arrayFold(
    (acc, x, y) -> acc + x * y,
    [1, 2, 3, 4],
    [10, 20, 30, 40],
    0::Int64
) AS res;
-- 300

All input arrays must have the same length, or ClickHouse raises an error. See Sizes of arrays don't match for that specific failure.

Building a Running Result

Because the accumulator can be any type — including an ArrayarrayFold is not limited to producing a single scalar. By accumulating into an array and reading back its last element, you can compute running calculations that genuinely depend on prior results.

A classic case is an exponentially weighted moving average (EWMA), where each smoothed value is alpha * x + (1 - alpha) * previous_smoothed:

WITH
    [40, 45, 43, 31, 20] AS data,
    0.3 AS alpha
SELECT arrayFold(
    (acc, x) -> arrayPushBack(acc, alpha * x + (1 - alpha) * acc[-1]),
    arrayPopFront(data),
    [CAST(data[1], 'Float64')]
) AS ewma;
-- [40, 41.5, 41.95, 38.665, 33.0655]

Here the accumulator is an array seeded with the first data point. Each step appends a new smoothed value computed from the prior one (acc[-1]). This kind of stateful recurrence is exactly what plain arrayMap and arrayCumSum cannot do, since they have no access to previously computed output.

ClickHouse offers several ways to collapse an array. Pick the most specific tool that does the job — arrayFold is the most powerful but also the slowest to write and reason about.

Function What it does Use when
arrayFold General left-to-right fold with a custom lambda and accumulator You need arbitrary state carried across elements (recurrences, smoothing, custom reductions)
arrayReduce Applies a named aggregate function ('sum', 'max', 'uniq', …) passed as a string A standard aggregate already expresses what you want
arrayReduceInRanges Like arrayReduce but over specified (offset, length) ranges, returning an array You need windowed aggregates within one array
arrayCumSum / arrayCumSumNonNegative Running totals You only need a cumulative sum
arrayMap Transforms each element independently No cross-element state is needed

For example, these two produce the same total, but the first is clearer and faster:

SELECT arrayReduce('sum', [10, 20, 30]);                       -- preferred for plain sum
SELECT arrayFold((acc, x) -> acc + x, [10, 20, 30], 0::UInt64); -- only if you need custom logic

Reach for arrayFold when no built-in aggregate or cumulative function captures your logic.

Best Practices

  1. Cast the initial accumulator explicitly. A bare literal like 0 is UInt8 and can overflow or fail type inference. Write 0::Int64, 0::Float64, or CAST(..., 'Float64') so the accumulator type matches the lambda's return type.

  2. Keep the lambda return type stable. The value the lambda returns becomes the next accumulator, so every iteration must return the same type as the initial accumulator. Mixing types (e.g. returning Float64 from an Int64 seed) causes errors.

  3. Prefer specialized functions when they fit. arrayReduce, arrayCumSum, arraySum, and window functions are simpler and usually faster. Only use arrayFold when you genuinely need cross-element state.

  4. Mind per-row array size. arrayFold runs entirely within a single row over that row's array. Very large arrays per row mean significant per-row CPU and memory; this is array processing, not distributed aggregation.

  5. Test type behavior on small inputs first. Fold logic with array-typed accumulators (like the EWMA pattern) is easy to get subtly wrong; validate against a handful of known values before running at scale.

Common Issues

  • Wrong argument order. Putting the accumulator first (as in some languages) fails — in ClickHouse the lambda is first and the initial accumulator is the last argument.
  • Mismatched array lengths. Multiple input arrays must be the same length; otherwise you get a "Sizes of arrays don't match" error. See the dedicated guide.
  • Accumulator type drift. If the lambda's output type doesn't match the seed type on every iteration, ClickHouse throws a type error. Cast the seed and keep returns consistent.
  • Empty arrays. Folding an empty array returns the initial accumulator unchanged — useful, but make sure your seed is a valid "neutral" value for your computation.

How Pulse Helps

Functions like arrayFold push heavy, stateful computation into a single row, which can quietly become a per-query CPU and memory hotspot at scale. Pulse monitors ClickHouse query performance and resource usage, surfacing slow array-heavy queries, memory-limit events, and the queries driving them so you can decide whether a fold belongs in SQL or upstream in your pipeline. Pulse is run by ClickHouse, Elasticsearch, and OpenSearch experts who can advise on rewriting expensive array logic into more efficient aggregate or window-function form.

Frequently Asked Questions

Q: What ClickHouse version introduced arrayFold?

arrayFold was added in ClickHouse v23.10. On older versions you must approximate it with arrayReduce, arrayCumSum, or recursive CTE-style workarounds.

Q: How is arrayFold different from arrayReduce?

arrayReduce applies a named aggregate function (passed as a string, e.g. 'sum' or 'max') and gives you no control over intermediate state. arrayFold runs a custom lambda with an explicit accumulator, so it can express recurrences and reductions that no single aggregate covers. Use arrayReduce when a standard aggregate fits; use arrayFold when it doesn't.

Q: Can arrayFold return an array instead of a scalar?

Yes. The accumulator can be any type, including Array. Seeding the accumulator with an array and appending to it (e.g. with arrayPushBack) is the standard pattern for running calculations like EWMA or cumulative products.

Q: Can I fold over more than one array at once?

Yes. Pass multiple arrays after the lambda, and give the lambda one element argument per array: arrayFold((acc, x, y) -> ..., arr1, arr2, seed). The arrays must all be the same length. This is how you compute dot products or element-wise combined reductions.

Q: Why does my arrayFold query throw a type error?

The most common cause is the lambda returning a different type than the initial accumulator. Every iteration's result becomes the next accumulator, so the types must match. Cast the seed explicitly (0::Float64) and ensure the lambda always returns that same type.

Q: Is arrayFold or a window function better for running calculations?

It depends on the data layout. If your sequence is already collapsed into one array per row, arrayFold keeps the computation inside that row. If the data is stored as separate rows, window functions or cumulative aggregations are usually a better fit. Avoid building giant per-row arrays solely to fold over them.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.