The "DB::Exception: Wrong global subquery" error in ClickHouse is raised when a GLOBAL IN or GLOBAL JOIN clause contains a subquery that ClickHouse considers invalid or improperly structured. The error code is WRONG_GLOBAL_SUBQUERY. This typically means the subquery does not meet the requirements ClickHouse imposes on expressions used with the GLOBAL keyword in distributed query execution.
Impact
This error prevents the query from executing, resulting in:
- Failed analytical queries that rely on GLOBAL IN or GLOBAL JOIN patterns
- Blocked application workflows until the subquery is corrected
- Potential confusion for developers transitioning from standard SQL, where these restrictions do not apply
Common Causes
- Subquery returns an incorrect number of columns -- For
GLOBAL IN, the subquery must return exactly the number of columns that match the left side of the IN expression. Returning extra or too few columns triggers this error. - Using a table function or expression that cannot be materialized -- GLOBAL subqueries require ClickHouse to execute the subquery on the initiator and ship the result to shards. Certain expressions or table functions may not be compatible with this mechanism.
- Nested GLOBAL subqueries -- Attempting to nest GLOBAL IN/JOIN within another GLOBAL subquery can result in an invalid query plan.
- Subquery references a non-existent table or column -- If the subquery itself is broken (referencing missing objects), the error may surface as a WRONG_GLOBAL_SUBQUERY rather than a more specific error.
- Using scalar subqueries where a table subquery is expected -- GLOBAL IN expects a set of rows, not a scalar value.
Troubleshooting and Resolution Steps
Verify the subquery works independently. Run the subquery by itself to confirm it returns valid results:
-- Test the subquery in isolation SELECT user_id FROM users WHERE country = 'US';If this fails, fix the subquery first before wrapping it in GLOBAL IN/JOIN.
Ensure column count matches for GLOBAL IN. The number of columns on the left side must match the subquery output:
-- Correct: single column on both sides SELECT * FROM orders WHERE user_id GLOBAL IN (SELECT user_id FROM users WHERE active = 1); -- Correct: tuple with matching columns SELECT * FROM orders WHERE (user_id, product_id) GLOBAL IN (SELECT user_id, product_id FROM returns); -- Wrong: column count mismatch SELECT * FROM orders WHERE user_id GLOBAL IN (SELECT user_id, name FROM users);Simplify nested subqueries. If you have nested GLOBAL expressions, flatten them:
-- Instead of nesting GLOBAL subqueries, use a temporary table: CREATE TEMPORARY TABLE tmp AS SELECT user_id FROM distributed_users WHERE country = 'US'; SELECT * FROM distributed_orders WHERE user_id GLOBAL IN (SELECT user_id FROM tmp);Check for incompatible table functions. If the subquery uses a table function like
remote(),url(), orfile(), try materializing the result first:-- May not work directly in GLOBAL IN: -- WHERE id GLOBAL IN (SELECT id FROM url('http://...', CSV, 'id UInt64')) -- Instead: CREATE TEMPORARY TABLE tmp AS SELECT id FROM url('http://...', CSV, 'id UInt64'); SELECT * FROM distributed_table WHERE id GLOBAL IN (SELECT id FROM tmp);Examine the full error message -- ClickHouse often includes details about what specifically is wrong with the subquery. Check the server log for the complete exception text.
Ensure you are not using a scalar subquery where a table subquery is needed:
-- Wrong: scalar subquery SELECT * FROM orders WHERE id GLOBAL IN (SELECT max(id) FROM users); -- Correct: wrap in a proper subquery if needed SELECT * FROM orders WHERE id GLOBAL IN (SELECT id FROM (SELECT max(id) AS id FROM users));
Best Practices
- Keep GLOBAL IN subqueries simple -- a straightforward SELECT with the appropriate number of columns is the safest pattern.
- Test GLOBAL subqueries independently before incorporating them into larger queries.
- When subqueries are complex, materialize them into temporary tables first, then reference those tables in the GLOBAL clause.
- Avoid deeply nested GLOBAL subquery structures; they are difficult to debug and may hit edge cases in the query planner.
- Review the ClickHouse documentation for the specific version you are running, as GLOBAL subquery handling has improved over time.
Frequently Asked Questions
Q: Can I use GLOBAL IN with a VALUES list instead of a subquery?
A: GLOBAL is not needed with a literal VALUES list because there is no distributed table involved in the IN clause. A plain IN (1, 2, 3) or IN (SELECT * FROM system.numbers LIMIT 3) on a local table works without GLOBAL.
Q: Why does my subquery work with regular IN but fail with GLOBAL IN?
A: GLOBAL IN has additional requirements. The subquery result must be materializable on the initiator node and transferable to remote shards. Some expressions, table functions, or query structures that work in a local context may not be compatible with this materialization step.
Q: Is there a size limit on GLOBAL IN subquery results?
A: There is no hard-coded limit, but the result must fit in memory on the initiator node and be transferred to each shard over the network. Practically, subqueries returning millions of rows can work but will consume significant resources. For very large sets, consider using a JOIN with a distributed dictionary instead.
Q: Does this error occur with GLOBAL JOIN as well?
A: Yes. The same validation applies to GLOBAL JOIN subqueries. The right side of the join must be a valid subquery that can be materialized and broadcast to all shards.