The "DB::Exception: Unsupported JOIN keys" error in ClickHouse is raised when the expressions used in a JOIN ON clause are not compatible with the selected join algorithm or storage engine. ClickHouse imposes specific requirements on what kinds of key expressions are allowed depending on the join method in use, and violating those requirements triggers the UNSUPPORTED_JOIN_KEYS error.
Impact
The query fails during the analysis phase before any data is processed. Any downstream queries, views, or applications relying on the join will also fail. If this join is part of a materialized view definition, inserts into the source table will be blocked.
Common Causes
- Using non-equality conditions with hash join -- the default hash join algorithm requires equality predicates (e.g.,
a.id = b.id). Range conditions or inequality operators are not supported in this mode. - Using OR conditions in JOIN ON -- ClickHouse's hash join does not support
ORin the ON clause. Only conjunctions (AND) of equality conditions are allowed. - Expressions referencing both tables -- a single expression in the ON clause that mixes columns from both sides of the join (e.g.,
a.x + b.y = 10) is not supported. - Using functions in join keys without proper support -- certain functions applied to join keys may not be supported by all join algorithms.
- Join key type not supported by the engine -- some table engines like
Joinhave restrictions on what key types they accept.
Troubleshooting and Resolution Steps
Ensure your ON clause uses simple equality conditions. Each condition should reference one column from the left table and one from the right:
-- Correct SELECT * FROM a JOIN b ON a.id = b.id AND a.date = b.date; -- Wrong: inequality not supported with hash join SELECT * FROM a JOIN b ON a.id = b.id AND a.value > b.value;Replace OR conditions with UNION ALL or restructure the query. If you need OR logic in a join, consider splitting it into multiple joins:
-- Instead of: JOIN b ON a.id = b.id1 OR a.id = b.id2 SELECT * FROM a JOIN b ON a.id = b.id1 UNION ALL SELECT * FROM a JOIN b ON a.id = b.id2;Move non-equality conditions to the WHERE clause:
-- Move range condition out of ON SELECT * FROM a JOIN b ON a.id = b.id WHERE a.value > b.value;Try a different join algorithm. Some algorithms support broader key expressions. You can set the algorithm per query:
SET join_algorithm = 'full_sorting_merge'; SELECT * FROM a JOIN b ON a.id = b.id AND a.value >= b.value;Pre-compute complex key expressions. If you need to join on a function result, compute it as a subquery column first:
SELECT * FROM (SELECT *, toDate(timestamp) AS join_date FROM a) AS a JOIN (SELECT *, toDate(timestamp) AS join_date FROM b) AS b ON a.join_date = b.join_date;
Best Practices
- Stick to simple equality conditions in JOIN ON clauses whenever possible -- this is both more performant and more portable across join algorithms.
- Understand the limitations of each join algorithm (
hash,partial_merge,full_sorting_merge,direct,auto) and choose the one that fits your use case. - Move filtering conditions that do not define the join relationship into the WHERE clause.
- When using the
Jointable engine, ensure the key columns match the engine's expectations at creation time.
Frequently Asked Questions
Q: Can I use expressions like toDate(a.ts) = toDate(b.ts) in a JOIN ON clause?
A: Yes, simple function applications to individual columns are generally supported. The restriction is on expressions that reference columns from both sides of the join within a single operand.
Q: Which join algorithm supports the broadest range of key expressions?
A: The full_sorting_merge algorithm tends to support a wider variety of conditions, including range joins and inequality predicates, compared to the default hash algorithm.
Q: Does this error apply to CROSS JOIN?
A: No. CROSS JOIN does not have an ON clause, so key restrictions do not apply. However, CROSS JOIN produces a Cartesian product which can be extremely expensive for large tables.
Q: Can I use ASOF JOIN for range-based conditions?
A: Yes. ASOF JOIN is specifically designed for inexact matching on ordered keys (typically timestamps). It requires one equality key and one inequality key, and is a good alternative when you need "closest match" semantics.