ClickHouse stores arrays as first-class column types. A column defined as Array(UInt32) holds a variable-length list of integers per row, stored contiguously within the columnar format. This design fits naturally for tags, event sequences, nested attributes, and other multi-valued fields — without requiring a separate join table. Querying across individual elements of those arrays, however, requires expanding them into rows. That is what ARRAY JOIN does.
ARRAY JOIN is a ClickHouse-specific SQL clause that unfolds an array column, producing one output row per array element. It behaves like an inner join between the row and its own array: each element is paired with the non-array columns of the originating row. Rows where the array is empty are excluded entirely from the result. Unlike SQL UNNEST in other databases, ARRAY JOIN is a clause in its own right and has several ClickHouse-specific behaviors worth understanding before using it in production queries.
Syntax
The clause appears after FROM and before WHERE:
SELECT <expr_list>
FROM <table>
[LEFT] ARRAY JOIN <array_expr> [AS alias] [, <array_expr2> [AS alias2] ...]
[WHERE <expr>]
A minimal example with an inline table:
SELECT s, arr
FROM (
SELECT 'Hello' AS s, [1, 2] AS arr
UNION ALL SELECT 'World', [3, 4, 5]
UNION ALL SELECT 'Goodbye', []
)
ARRAY JOIN arr;
This produces five rows: two for Hello (elements 1 and 2), three for World (elements 3, 4, 5), and none for Goodbye because its array is empty. The arr column in the output contains individual scalar values, not arrays.
An alias lets you reference the expanded element while preserving access to the original array:
SELECT s, a, arr
FROM arrays_test
ARRAY JOIN arr AS a;
Here a is the scalar element and arr remains the full Array(UInt8) column. This is useful when you need both the original array for context and the individual element for filtering or grouping.
How It Works
ARRAY JOIN processes data in a streaming fashion — it does not materialize a full intermediate table before passing rows downstream. ClickHouse reads input blocks and expands each row that contains a non-empty array into multiple output rows within the same block pipeline. This makes it more memory-efficient than approaches that would first collect all rows and then expand, though peak memory still scales with block size multiplied by array length (see the performance section below).
The expanded rows inherit all non-array column values from the original row. The array column itself is replaced by the scalar element value in the output.
LEFT ARRAY JOIN
LEFT ARRAY JOIN retains rows where the array is empty, filling the expanded column with the default value for the element type: 0 for integers, '' for strings, and the zero value for dates and other types.
SELECT s, arr
FROM (
SELECT 'Hello' AS s, [1, 2] AS arr
UNION ALL SELECT 'World', [3, 4, 5]
UNION ALL SELECT 'Goodbye', []
)
LEFT ARRAY JOIN arr;
-- 6 rows: 'Goodbye' appears once with arr = 0
This distinction matters in practice. When computing per-row metrics that should include rows without array data — counting events per user where some users have no tags, for example — LEFT ARRAY JOIN prevents silent row loss. Standard ARRAY JOIN is the right choice when empty arrays represent genuinely irrelevant rows.
Note: LEFT ARRAY JOIN fills empty-array rows with the type's default value, not NULL. The join_use_nulls setting does not affect ARRAY JOIN behavior.
Multiple Arrays in a Single Clause
A single ARRAY JOIN clause can expand multiple arrays simultaneously:
SELECT s, a, b
FROM arrays_test
ARRAY JOIN arr AS a, arr2 AS b;
-- arr and arr2 must have equal length per row
-- Row with arr=[1,2] and arr2=['x','y'] produces: (1,'x') and (2,'y')
ClickHouse expands multiple comma-separated arrays element-wise (like a zip operation), not as a Cartesian product. The first element of arr is paired with the first element of arr2, and so on. If the arrays have different lengths, the query fails by default.
To allow arrays of different lengths, set enable_unaligned_array_join = 1 in the SETTINGS clause. Shorter arrays are padded with type default values to match the longest array:
SELECT s, a, b
FROM arrays_test
ARRAY JOIN arr AS a, arr_short AS b
SETTINGS enable_unaligned_array_join = 1;
The default for enable_unaligned_array_join is 0 (disabled). Padding with default values can produce misleading results, so it should be used deliberately.
Common misconception: Using two separate arrayJoin() function calls in the same SELECT — rather than the ARRAY JOIN clause — does produce a Cartesian product. SELECT arrayJoin([1,2]) AS a, arrayJoin(['x','y']) AS b returns four rows, not two.
Practical Examples
Counting with arrayEnumerate
After ARRAY JOIN expansion, each source row produces multiple output rows. If you want to count source rows that match a condition rather than expanded element rows, use arrayEnumerateUniq:
SELECT
Goals.ID AS GoalID,
sum(Sign) AS Reaches,
sumIf(Sign, num = 1) AS Visits
FROM test.visits
ARRAY JOIN Goals, arrayEnumerateUniq(Goals.ID) AS num
WHERE CounterID = 160656
GROUP BY GoalID
ORDER BY Reaches DESC
LIMIT 10;
arrayEnumerateUniq(arr) returns, for each element, how many times that element's value has already occurred in the array — so the first occurrence of each distinct value gets 1, the second 2, and so on (for example, arrayEnumerateUniq([10, 20, 10]) returns [1, 1, 2]). Filtering on num = 1 therefore keeps the first occurrence of each distinct goal, so sumIf(Sign, num = 1) counts each session once even if it contributed the same goal multiple times.
arrayEnumerate(arr) is the function that returns the simple 1-based positional index of each element regardless of value — arrayEnumerate([10, 20, 10]) returns [1, 2, 3] — useful when you want position-aware processing after expansion.
Nullable Array Columns
ARRAY JOIN requires the joined expression to be of a non-nullable Array(T) type. Attempting to use a Nullable(Array(T)) column directly throws an error. Wrap with ifNull or assumeNotNull:
SELECT id, elem
FROM my_table
ARRAY JOIN ifNull(nullable_arr, []) AS elem;
Querying system.query_log
system.query_log stores the ProfileEvents column as Map(LowCardinality(String), UInt64). ARRAY JOIN is the standard way to unnest it for per-metric diagnostics:
SELECT PE.Names AS metric, PE.Values AS value
FROM system.query_log
ARRAY JOIN ProfileEvents AS PE
WHERE query_id = 'your-query-id'
AND type = 2 -- QueryFinish
AND event_date = today()
ORDER BY value DESC;
To compare profile metrics between two query versions:
WITH
query_id = 'old-query-id' AS first,
query_id = 'new-query-id' AS second
SELECT
PE.Names AS metric,
anyIf(PE.Values, first) AS v1,
anyIf(PE.Values, second) AS v2
FROM clusterAllReplicas(default, system.query_log)
ARRAY JOIN ProfileEvents AS PE
WHERE (first OR second)
AND event_date = today()
AND type = 2
GROUP BY metric
HAVING v1 != v2
ORDER BY (v2 - v1) / (v1 + v2) ASC;
Nested Data Structures
ClickHouse's Nested data type is syntactic sugar over parallel arrays with a shared prefix. A column declared as Nested(x UInt32, y String) is stored internally as two arrays (nest.x Array(UInt32) and nest.y Array(String)) constrained to equal length per row.
ARRAY JOIN handles nested structures by referencing the parent name, which expands all sub-arrays in lockstep:
SELECT s, `nest.x`, `nest.y`
FROM nested_test
ARRAY JOIN nest;
This is equivalent to listing each sub-column individually (ARRAY JOIN nest.x, nest.y) and produces the same parallel expansion. You can also alias the nested structure:
SELECT Goal.ID, Goal.EventTime
FROM test.visits
ARRAY JOIN Goals AS Goal
WHERE CounterID = 101500
LIMIT 10;
The Goal alias enables dot-notation access on the alias rather than the raw column name.
Performance Considerations
ARRAY JOIN is streamable and does not require materializing a full intermediate table. However, peak memory usage is proportional to max_block_size multiplied by average array length. With max_block_size = 65536 (the default) and arrays of 100 elements, a single block can expand to over 6 million rows in memory.
For memory-intensive queries, reducing max_block_size can significantly reduce peak RAM at the cost of some throughput:
SELECT count()
FROM large_table
ARRAY JOIN big_array AS elem
WHERE elem > 0
SETTINGS max_block_size = 8192;
Additional guidance:
- Filter on non-array columns with
PREWHEREorWHEREbefore the expansion to reduce the number of rows enteringARRAY JOIN. The query optimizer can push filters that reference non-array columns above the expansion. - If a
WHEREpredicate references the expanded element alias (e.g.,WHERE elem > 100), it is applied after expansion and cannot be pushed aboveARRAY JOIN. UseLEFT ARRAY JOINif you need to retain source rows where no element passes the filter. - When only aggregated results are needed, a
GROUP BYdirectly afterARRAY JOINlets ClickHouse aggregate incrementally rather than materializing the full expanded dataset. - Consider array functions (
arrayExists,arrayCount,arrayFilter,arrayMap) for logic that can be expressed entirely on the array without row multiplication. These operate in-place and avoid the row expansion overhead entirely.
Comparison to Alternatives
| Approach | Use when |
|---|---|
ARRAY JOIN clause |
Expanding one or more parallel arrays into rows; querying individual elements; joining Nested types |
arrayJoin() function |
Simple single-array expansion inline within an expression; equivalent for one array, but multiple calls produce Cartesian products |
arrayExists(elem -> cond, arr) |
Testing whether any element matches a condition without row expansion |
arrayCount(elem -> cond, arr) |
Counting elements matching a condition in-place |
arrayFilter(elem -> cond, arr) |
Filtering elements within the array, returning a new array |
has(arr, value) |
Checking array membership for a single value |
Prefer array functions over ARRAY JOIN when the result only needs aggregate information about the array contents. Use ARRAY JOIN when you need to join the expanded elements against other tables, GROUP BY element values, or return element-level rows to the client.
How Pulse Helps
Pulse monitors ClickHouse query performance and flags queries where ARRAY JOIN expansion produces unexpectedly large result sets or contributes to high memory usage. When an array expansion query causes resource spikes, Pulse identifies the query, the expansion ratio (rows read vs. rows produced), and recommends whether to adjust max_block_size, add a PREWHERE filter, or switch to an array function alternative.
Frequently Asked Questions
Q: Does ARRAY JOIN with two arrays produce a Cartesian product?
A: No. Multiple arrays listed in a single ARRAY JOIN clause are expanded element-wise (parallel / zip), not as a Cartesian product. The Cartesian product only occurs when you use multiple arrayJoin() function calls in the same SELECT.
Q: What happens with LEFT ARRAY JOIN and an empty array — does it return NULL?
A: No. LEFT ARRAY JOIN returns the type's default value for the empty-array row: 0 for integer types, '' for strings, and so on. It does not return NULL. The join_use_nulls setting does not affect ARRAY JOIN.
Q: Can I use ARRAY JOIN on a Nullable(Array(T)) column?
A: Not directly. ARRAY JOIN requires a non-nullable Array(T) type and will throw an error on Nullable arrays. Wrap the column with ifNull(col, []) to treat NULL as an empty array, or with assumeNotNull(col) if you are certain the column has no NULL values.
Q: How do I count each source row only once after ARRAY JOIN expansion?
A: Use arrayEnumerateUniq(arr). For each element it returns how many times that element's value has occurred so far within the row, so the first occurrence of each distinct value gets 1. Filtering to num = 1 with sumIf(Sign, num = 1) or countIf(num = 1) then counts each distinct value once per source row, rather than once per expanded element.
Q: Why does my ARRAY JOIN query fail when using arrays of different lengths?
A: By default, all arrays in a single ARRAY JOIN clause must have equal length. Add SETTINGS enable_unaligned_array_join = 1 to allow mismatched lengths; shorter arrays will be padded with type default values to match the longest array in that row.
Q: Is there a performance difference between the ARRAY JOIN clause and the arrayJoin() function?
A: For a single array, the behavior is equivalent. The ARRAY JOIN clause is generally preferred because it integrates cleanly with WHERE, GROUP BY, and aliases. Using multiple arrayJoin() calls in the same SELECT creates a Cartesian product of rows — a common source of accidental result set explosion.