ANTI JOIN is a join strictness modifier in ClickHouse that returns rows from one table that have no matching rows in the other. It is the direct inverse of SEMI JOIN: where SEMI JOIN keeps rows that do match, ANTI JOIN keeps rows that do not. The result contains only columns from the preserved side — no columns from the opposite (filtering) table appear in the output, and each qualifying row appears exactly once regardless of how many rows exist on the filtering side.
The two variants are LEFT ANTI JOIN and RIGHT ANTI JOIN. LEFT ANTI JOIN returns rows from the left table where no matching key exists in the right table. RIGHT ANTI JOIN returns rows from the right table that have no match on the left. LEFT ANTI JOIN covers the vast majority of real-world use cases: "give me everything in table A that is absent from table B."
Syntax
The ANTI strictness modifier sits between the direction keyword and the JOIN keyword:
SELECT <expr_list>
FROM <left_table>
[GLOBAL] LEFT ANTI JOIN <right_table>
(ON <expr_list>) | (USING <column_list>)
A concrete example with ON:
SELECT m.name
FROM movies AS m
LEFT ANTI JOIN genres AS g ON m.id = g.movie_id
ORDER BY year DESC, name ASC
LIMIT 10;
The equivalent with USING when column names match on both sides:
SELECT order_id
FROM orders
LEFT ANTI JOIN cancelled_orders USING (order_id)
SETTINGS join_algorithm = 'hash';
RIGHT ANTI JOIN mirrors the direction:
SELECT g.genre_name
FROM movies AS m
RIGHT ANTI JOIN genres AS g ON m.id = g.movie_id
ORDER BY g.genre_name;
Important: prefer ON over USING with ANTI JOIN. A known bug (GitHub issue #20455) causes unqualified column names in SELECT to resolve to the wrong table when USING is used. If you must use USING, qualify every column reference with the table alias: SELECT t1.col1, t1.col2.
How It Works
ANTI JOIN acts as a denylist check on join keys. ClickHouse builds a hash set from the right table's join key column(s), then streams the left table and emits only rows whose key is absent from that set. It never produces a Cartesian product and returns at most one output row per input row from the preserved side.
Because only the join key columns need to be read from the right table (not all columns), ANTI JOIN can be more memory-efficient than a full LEFT JOIN when the right table is wide.
NULL handling: rows where the join key is NULL on the left side are not returned by LEFT ANTI JOIN. NULL does not match anything — including other NULL values — which follows standard SQL three-valued logic. This is often surprising but is correct behavior.
Algorithm constraints: ANTI JOIN is supported by the hash, parallel_hash, and grace_hash algorithms (and by direct for LEFT ANTI against a Dictionary or key-value table). The full_sorting_merge and partial_merge algorithms do not support it. This has one critical implication: if join_algorithm is set to 'auto', ClickHouse will attempt hash first, but if the hash table exceeds memory limits it will fall back to partial_merge — which will throw an error for ANTI JOIN. To bound memory while keeping ANTI JOIN support, set join_algorithm = 'grace_hash' (it supports ANTI JOIN and spills to disk) or set join_algorithm = 'hash' explicitly and control memory via max_bytes_in_join. Do not rely on 'auto'.
Practical Examples
Find users who have never placed an order:
SELECT t1.id, t1.name, t1.created_at
FROM users AS t1
LEFT ANTI JOIN deleted_users AS t2 ON t1.id = t2.user_id;
Always enumerate specific columns from the preserved table rather than using SELECT *. A known bug (GitHub issue #95738) causes SELECT * to incorrectly include columns from the right table in the result set.
Distributed tables: broadcast the right side with GLOBAL:
SELECT user_id, event_type
FROM distributed_events
GLOBAL LEFT ANTI JOIN (
SELECT user_id FROM churned_users
) AS cu ON distributed_events.user_id = cu.user_id;
GLOBAL LEFT ANTI JOIN broadcasts the right-side subquery to all shards, executing the anti-join locally on each shard. This avoids a distributed shuffle and is the recommended pattern when the right table is small relative to the distributed left table.
ANTI JOIN with an inequality condition (requires ClickHouse 24.8+, hash algorithm only):
SELECT u.user_id, u.signup_date
FROM users AS u
LEFT ANTI JOIN premium_users AS p
ON u.user_id = p.user_id AND u.signup_date < p.upgrade_date
SETTINGS join_algorithm = 'hash';
ClickHouse 24.8 LTS added support for LEFT/RIGHT SEMI/ANTI/ANY JOIN with unequal conditions involving columns from both tables. This is not supported when join_use_nulls = 1.
Setting explicit memory limits:
SELECT l.id, l.value
FROM large_table AS l
LEFT ANTI JOIN reference_table AS r ON l.id = r.id
SETTINGS
join_algorithm = 'hash',
max_bytes_in_join = 10000000000,
join_overflow_mode = 'throw';
Performance Considerations
Memory consumption tracks directly with the size of the right-side hash set. Key practices:
- Put the smaller, filtering table on the right. The right side is what gets loaded into the hash table.
- Pre-filter the right side in a subquery or CTE. Select only the join key column and apply relevant
WHEREconditions before the join. This keeps the hash table as small as possible. - Do not use
join_algorithm = 'auto'withANTI JOIN. If the hash table exceeds memory limits, thepartial_mergefallback will fail. For large right tables, usejoin_algorithm = 'grace_hash'(it supportsANTI JOINand spills to disk); otherwise setjoin_algorithm = 'hash'explicitly and control the limit withmax_bytes_in_join. - Runtime filters:
enable_join_runtime_filters(default1) gained properANTI JOINsupport only in ClickHouse 25.12. Before that version, multi-keyANTI JOINqueries with this setting enabled could produce incorrect results. On versions before 25.12, consider settingenable_join_runtime_filters = 0forANTI JOINqueries as a precaution.
Diagnosing slow ANTI JOIN queries via system.query_log:
SELECT
query_id,
query_duration_ms,
memory_usage,
read_rows,
read_bytes,
ProfileEvents['ExternalProcessingUncompressedBytesTotal'] AS spilled_bytes_uncompressed,
ProfileEvents['ExternalProcessingCompressedBytesTotal'] AS spilled_bytes_compressed
FROM system.query_log
WHERE
type = 'QueryFinish'
AND query ILIKE '%ANTI JOIN%'
AND event_time >= now() - INTERVAL 1 HOUR
ORDER BY query_duration_ms DESC
LIMIT 20;
Non-zero values in spilled_bytes_uncompressed indicate disk spill during join processing — a signal to either increase memory limits or reduce the right-side hash table size.
Comparison to Alternatives
Three patterns express the same logical operation — "rows in A with no match in B" — but they differ in behavior and execution cost.
NOT IN
SELECT order_id FROM orders
WHERE order_id NOT IN (SELECT order_id FROM cancelled_orders);
The NULL trap: if the subquery returns any NULL value, NOT IN evaluates to unknown (0) for every non-NULL left-side value, potentially returning zero rows. This is standard SQL three-valued logic, but it catches many developers off guard. The safe workaround:
SELECT order_id FROM orders
WHERE order_id NOT IN (
SELECT order_id FROM cancelled_orders WHERE order_id IS NOT NULL
);
LEFT ANTI JOIN simply skips rows with NULL join keys on either side and returns all left rows with no non-NULL match. The two are only equivalent when the join key is guaranteed NOT NULL on both sides.
LEFT JOIN ... WHERE right.key IS NULL
SELECT u.user_id, u.name
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id
WHERE o.user_id IS NULL;
This produces correct results but has a structural inefficiency: LEFT JOIN materializes all matched and unmatched rows — potentially expanding the intermediate result set if there are multiple matches per key — before the WHERE clause filters down. ANTI JOIN short-circuits this: it stops checking the right table for a given left row as soon as it finds a single match and excludes that row immediately.
Also note that join_use_nulls (default 0) affects LEFT JOIN output: with the default, unmatched right-side columns are filled with type default values (0, empty string, 1970-01-01) rather than NULL, which means the WHERE o.user_id IS NULL filter may not behave as expected unless join_use_nulls = 1. ANTI JOIN is unaffected by join_use_nulls — it returns only columns from the preserved table.
| Pattern | NULL-safe | Returns right-side columns | Algorithm constraint | Intermediate expansion |
|---|---|---|---|---|
LEFT ANTI JOIN |
Yes (skips NULL keys) | No | hash / parallel_hash / grace_hash |
None |
NOT IN |
No (NULL in subquery = empty result) | No | N/A | N/A |
LEFT JOIN ... IS NULL |
Depends on join_use_nulls |
Yes (as NULLs/defaults) | All algorithms | Yes (before filter) |
Key Settings
| Setting | Default | Notes for ANTI JOIN |
|---|---|---|
join_algorithm |
default |
hash, parallel_hash, and grace_hash support ANTI JOIN; full_sorting_merge and partial_merge do not. Do not use auto — fallback to partial_merge will fail. |
max_bytes_in_join |
0 (unlimited) |
Maximum hash table size in bytes. Set this to prevent OOM. |
max_rows_in_join |
0 (unlimited) |
Maximum rows in the hash table. |
join_overflow_mode |
throw |
Action when limits are exceeded: throw or break. |
join_use_nulls |
0 |
No practical effect on ANTI JOIN — only preserved-side columns are returned. |
How Pulse Helps
Pulse continuously monitors your ClickHouse query workload and surfaces slow or memory-intensive ANTI JOIN queries before they become incidents — flagging algorithm mismatches, missing hash table size limits, and known version-specific bugs. Connect your cluster to get query-level diagnostics without manual log analysis.
Frequently Asked Questions
Q: Can I use grace_hash or parallel_hash with ANTI JOIN to handle large right-side tables?
A: Yes. Both grace_hash and parallel_hash support ANTI JOIN. grace_hash is the best choice for a large right side because it spills to disk instead of failing on memory limits. The algorithms that do not support ANTI JOIN are full_sorting_merge and partial_merge. Still pre-filter the right side to reduce hash table size, and set max_bytes_in_join with join_overflow_mode = 'throw' as a guard.
Q: Why does LEFT ANTI JOIN return fewer rows than expected when my join key can be NULL?
A: Rows where the join key is NULL on the left side are not returned by LEFT ANTI JOIN. NULL does not match anything — including other NULL values. If you expect NULL-keyed rows to appear in the output, handle them separately with a UNION ALL.
Q: Is LEFT ANTI JOIN always equivalent to NOT IN with a subquery?
A: Only when the join key is NOT NULL on both sides. If the NOT IN subquery returns any NULL value, the entire NOT IN expression evaluates to unknown for every row, potentially returning zero results. LEFT ANTI JOIN handles this more predictably by simply excluding NULL-keyed rows from matching.
Q: Why does SELECT * return unexpected columns from the right table in my ANTI JOIN query?
A: This is a known bug (GitHub issue #95738). SELECT * with ANTI JOIN or SEMI JOIN incorrectly includes columns from the filtering (right) side. Always enumerate specific columns from the preserved table explicitly: SELECT t1.id, t1.name, ... rather than SELECT *.
Q: Should I use ANTI JOIN or NOT EXISTS for exclusion queries?
A: ClickHouse does not have a NOT EXISTS operator in the traditional correlated-subquery sense. The idiomatic ClickHouse alternatives are LEFT ANTI JOIN and NOT IN. For queries where NULL safety matters or where the right-side dataset is large, LEFT ANTI JOIN is the more predictable and often more efficient choice.
Q: What is the difference between ANTI and ANY strictness in ClickHouse JOINs?
A: ANY is a strictness modifier that, for LEFT ANY JOIN, returns the first matching right-side row per left-side key — rather than all matches like ALL (the default). ANTI is the inverse: it returns left-side rows with no match at all. They serve opposite filtering goals.