ClickHouse CROSS JOIN

A CROSS JOIN in ClickHouse produces the full Cartesian product of two tables: every row from the left table is paired with every row from the right table. No join key, ON clause, or USING clause is specified. If the left table has N rows and the right table has M rows, the result contains exactly N * M rows.

This behavior makes CROSS JOIN useful for a narrow class of problems: generating all combinations of two small sets, building report scaffolds with dense date-dimension grids, or fanning out a fact table against a tiny parameter table. It is rarely appropriate for joining large tables, and the consequences of applying it incorrectly — quadratic row growth, unbounded memory consumption — can be severe. Understanding both the mechanics and the guardrails is essential for using it safely.

Syntax

ClickHouse supports two equivalent syntaxes for a cross join.

The explicit form uses the CROSS JOIN keywords:

SELECT m.name, g.genre
FROM movies AS m
CROSS JOIN genres AS g
LIMIT 10;

The implicit form uses comma-separated tables in the FROM clause:

SELECT m.name, g.genre
FROM movies AS m, genres AS g
LIMIT 10;

Both forms produce identical results. Neither accepts an ON or USING clause. For queries involving three or more tables, prefer the explicit chained form over comma syntax, which had a known bug with three-or-more tables in older ClickHouse versions (fixed in a later release via PR #42511):

SELECT a.n, b.n, c.n
FROM numbers(3) AS a
CROSS JOIN numbers(3) AS b
CROSS JOIN numbers(3) AS c;

How It Works

ClickHouse implements CROSS JOIN using the standard hash join algorithm — specifically through the internal joinBlockImplCross() method in HashJoin.cpp. It is not a separate algorithm; it is a join type handled within the hash join path.

At execution time, ClickHouse loads all blocks from the right-hand table into an in-memory hash table first, then iterates over blocks from the left table, emitting every left-right combination. Because the right table is fully materialized in memory before any output is produced, right-table size directly drives memory consumption. Adding a WHERE clause does not reduce the memory used to build the right-table hash table — it only reduces the number of rows in the final output.

CROSS JOIN is only supported by the hash join algorithm. The parallel_hash and grace_hash algorithms, which handle other join types, do not support CROSS JOIN. When join_algorithm is set to auto (the default), ClickHouse correctly selects hash for cross joins even though auto otherwise defaults to parallel hash join as of ClickHouse 24.11.

Automatic Rewrite to INNER JOIN

When ClickHouse encounters a CROSS JOIN (or comma-syntax join) with a WHERE clause containing a condition that references columns from both tables, it attempts to rewrite the query as an INNER JOIN. This avoids materializing the full Cartesian product and can improve performance by orders of magnitude.

The rewrite adds the ALL keyword to preserve Cartesian product semantics, since regular INNER JOIN can be configured differently. For example, this query:

SELECT m.name AS name, g.genre AS genre
FROM movies AS m
CROSS JOIN genres AS g
WHERE m.id = g.movie_id
ORDER BY m.year DESC, m.name ASC, g.genre ASC
LIMIT 10;

gets rewritten internally to an ALL INNER JOIN ON m.id = g.movie_id. Use EXPLAIN SYNTAX to see the rewritten form:

EXPLAIN SYNTAX
SELECT m.name AS name, g.genre AS genre
FROM movies AS m
CROSS JOIN genres AS g
WHERE m.id = g.movie_id;

The cross_to_inner_join_rewrite setting controls this behavior:

Value Behavior
0 Disable the rewrite entirely. Execute as a true Cartesian product.
1 Attempt the rewrite. If it fails, proceed as a full CROSS JOIN (slower but no error). Default.
2 Attempt the rewrite. If it fails, raise an error: "Please, try to simplify WHERE section."

Note that value 1 does not mean ClickHouse always rewrites CROSS JOIN to INNER JOIN. If there is no equi-join condition in the WHERE clause, no rewrite occurs regardless of this setting — the full Cartesian product executes.

To force the rewrite and surface accidental cross joins:

SELECT a.id, b.name
FROM table_a AS a
CROSS JOIN table_b AS b
WHERE a.id = b.foreign_id
SETTINGS cross_to_inner_join_rewrite = 2;

Practical Examples

Generating all date-category combinations

A common use case is building a dense scaffold for time-series reporting, where you need a row for every date-category pair even when no events occurred:

SELECT
    d.date,
    c.category,
    coalesce(e.event_count, 0) AS event_count
FROM
    (SELECT toDate('2025-01-01') + number AS date
     FROM numbers(31)) AS d
CROSS JOIN
    (SELECT DISTINCT category FROM events) AS c
LEFT JOIN
    (SELECT toDate(event_time) AS date, category, count() AS event_count
     FROM events
     WHERE event_time >= '2025-01-01' AND event_time < '2025-02-01'
     GROUP BY date, category) AS e
    ON d.date = e.date AND c.category = e.category
ORDER BY d.date, c.category;

The CROSS JOIN between 31 generated dates and a small set of categories is cheap. The subsequent LEFT JOIN fills in actual counts. Without the cross join, dates with zero events would be absent from the output.

Fanning out scenarios against a fact table

SELECT
    events.event_id,
    scenarios.scenario_name,
    events.value * scenarios.multiplier AS projected_value
FROM events
CROSS JOIN (
    SELECT 'pessimistic' AS scenario_name, 0.8 AS multiplier
    UNION ALL SELECT 'base', 1.0
    UNION ALL SELECT 'optimistic', 1.2
) AS scenarios;

Here the right side is a three-row inline table. The result has three times as many rows as events, one copy per scenario. This pattern is efficient because the right table is trivially small.

Self-join to generate all unique pairs

SELECT p1.product_id AS product_a, p2.product_id AS product_b
FROM products AS p1
CROSS JOIN products AS p2
WHERE p1.product_id < p2.product_id;

This produces all unique ordered pairs. The WHERE condition here filters the output but does not reduce the right-table memory build — the full self-join hash table is materialized first. Keep the source table small for this pattern.

Performance Considerations

Memory growth is quadratic. Two tables with 1,000 rows each produce 1,000,000 result rows. Two tables with 1,000,000 rows each produce 1,000,000,000,000 result rows. ClickHouse does not impose a hard limit on CROSS JOIN output size by default, so the limits must be set explicitly.

The relevant settings with their defaults:

Setting Default Description
max_rows_in_join 0 (unlimited) Maximum rows in the right-table hash table.
max_bytes_in_join 0 (unlimited) Maximum bytes in the right-table hash table.
join_overflow_mode throw Action when limits are exceeded: throw raises an exception; break returns a partial result.

Both max_rows_in_join and max_bytes_in_join default to zero (unlimited). You must set them explicitly to protect against runaway queries. A practical defensive configuration:

SELECT t1.id, t2.value
FROM table_a AS t1
CROSS JOIN table_b AS t2
SETTINGS max_bytes_in_join = 10737418240,  -- 10 GiB
         join_overflow_mode = 'throw';

Memory optimization in ClickHouse 24.5+

ClickHouse 24.5 introduced two improvements for CROSS JOIN:

LZ4 compression for right-table blocks. When the right table exceeds either of these thresholds (evaluated with OR), ClickHouse compresses the in-memory blocks using LZ4:

Setting Default Description
cross_join_min_rows_to_compress 10,000,000 Minimum rows before LZ4 compression activates.
cross_join_min_bytes_to_compress 1,073,741,824 (1 GiB) Minimum bytes before LZ4 compression activates.

Benchmark on 1 billion rows: compared to ClickHouse 24.4 (51.77 GiB peak memory), 24.5 without compression used 19.06 GiB (63% reduction), and 24.5 with compression used 5.36 GiB (90% reduction).

Disk spilling. Prior to 24.5, exceeding max_bytes_in_join would abort the query immediately. As of 24.5, ClickHouse can spill to disk instead, reducing peak memory to as low as 300 MiB in the same benchmark. Disk spilling activates when max_bytes_in_join or max_rows_in_join is set and exceeded, so you must set these limits to enable spilling.

To lower compression thresholds for a memory-constrained environment:

SELECT t1.id, t2.id
FROM large_table AS t1
CROSS JOIN medium_table AS t2
SETTINGS cross_join_min_rows_to_compress = 1000000,
         cross_join_min_bytes_to_compress = 104857600;  -- 100 MiB

Comparison to Alternatives

INNER JOIN with ON — prefer this when you have an equi-join condition. It is more explicit, communicates intent clearly, and avoids any risk of accidentally producing a full Cartesian product. If ClickHouse rewrites your CROSS JOIN WHERE into an INNER JOIN anyway, you may as well write it as an INNER JOIN directly.

ARRAY JOIN — when you need to fan out rows based on array-typed columns rather than a second table, ARRAY JOIN is more efficient than constructing a small table and cross joining it.

Subquery with IN — for filtering rather than combination generation, a WHERE x IN (SELECT ...) pattern avoids producing and then discarding rows.

numbers() table function — for generating sequences or grids of integers, numbers() combined with CROSS JOIN is idiomatic in ClickHouse and typically very fast because both sides are computed, not scanned from storage.

Diagnosing CROSS JOIN Issues

To find recent queries using CROSS JOIN and their memory consumption:

SELECT
    type,
    event_time,
    query_duration_ms,
    formatReadableSize(memory_usage) AS memory,
    substring(normalizeQuery(query), 1, 200) AS query_sample
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query ilike '%CROSS JOIN%'
  AND event_time >= now() - INTERVAL 1 DAY
ORDER BY memory_usage DESC
LIMIT 20;

To monitor currently running CROSS JOIN queries:

SELECT
    query_id,
    user,
    elapsed,
    formatReadableSize(memory_usage) AS memory,
    substring(query, 1, 200) AS query_sample
FROM system.processes
WHERE query ilike '%CROSS JOIN%'
ORDER BY memory_usage DESC;

To find queries that were killed due to memory limit exceeded (error code 241):

SELECT event_time, query_id, user, exception_code, exception,
    substring(query, 1, 300) AS query_sample
FROM system.query_log
WHERE type = 'ExceptionWhileProcessing'
  AND exception_code = 241
  AND event_time >= now() - INTERVAL 1 DAY
ORDER BY event_time DESC
LIMIT 20;

How Pulse Helps

Pulse monitors your ClickHouse queries in real time, alerting you when a CROSS JOIN is consuming unusual memory or running longer than expected. It surfaces query patterns with missing join conditions before they reach production and provides actionable recommendations for converting unintentional Cartesian products into efficient INNER JOIN queries.

Frequently Asked Questions

Q: Does adding a WHERE clause to a CROSS JOIN reduce memory usage?
A: No. ClickHouse builds the full right-table hash table in memory first, then scans left-table blocks and applies the WHERE filter when emitting rows. The WHERE clause reduces output row count but does not reduce the memory needed to hold the right table. If ClickHouse can rewrite the CROSS JOIN to an INNER JOIN via cross_to_inner_join_rewrite, memory usage drops significantly — but only the rewrite eliminates the full right-table materialization.

Q: What is the difference between comma syntax and explicit CROSS JOIN?
A: Functionally they are equivalent in modern ClickHouse. Both produce a Cartesian product. The explicit CROSS JOIN keyword is clearer in intent, safer when combining with other explicit JOIN types in the same query, and the recommended form for three or more tables. Comma syntax was prone to bugs in older versions when mixed with explicit JOINs.

Q: Why does cross_to_inner_join_rewrite default to 1 and not 2?
A: Value 1 is the permissive default: if rewriting fails for any reason, the query proceeds as a true CROSS JOIN. Value 2 is stricter and will error if the rewrite cannot be performed. Production environments where accidental Cartesian products are a risk can benefit from setting this to 2; development environments or systems that intentionally use CROSS JOIN should leave it at 1 or 0.

Q: Can I use CROSS JOIN with more than two tables?
A: Yes. Chain explicit CROSS JOIN keywords for three or more tables. Comma syntax for three-or-more tables had a historical bug (fixed in a later release) and should be avoided for multi-table cross joins.

Q: Does CROSS JOIN support disk spilling in ClickHouse?
A: Yes, as of ClickHouse 24.5. Prior versions would abort the query when memory limits were exceeded. In 24.5+, setting max_bytes_in_join or max_rows_in_join to a non-zero value enables disk spilling when those limits are hit, allowing very large CROSS JOIN operations to complete with controlled memory usage at the cost of disk I/O.

Q: Does join reordering apply to CROSS JOIN?
A: Yes, starting from ClickHouse 25.9, the query_plan_optimize_join_order_limit setting enables global join reordering and includes CROSS JOIN alongside INNER, LEFT, SEMI, ANTI, and FULL joins. When enabled with a non-zero limit, ClickHouse uses statistics to find the optimal join order automatically.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.