NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse DB::Exception: Distributed IN/JOIN subquery denied

The "DB::Exception: Distributed IN/JOIN subquery denied" error in ClickHouse is triggered when a query uses an IN or JOIN clause with a subquery that references a distributed table, but does not use the GLOBAL keyword. The error code is DISTRIBUTED_IN_JOIN_SUBQUERY_DENIED. ClickHouse raises this to prevent incorrect results that can arise from executing a distributed subquery locally on each shard.

Impact

This error prevents the query from running at all, which means:

  • Queries combining distributed tables through IN or JOIN subqueries will fail until corrected
  • Application logic that dynamically constructs queries with subqueries may break when distributed tables are involved
  • Developers unfamiliar with ClickHouse's distributed query semantics may be blocked until they understand the GLOBAL keyword requirement

Common Causes

  1. Using IN with a distributed subquery without GLOBAL -- A query like SELECT ... FROM dist_table WHERE id IN (SELECT id FROM another_dist_table) executes the subquery on each shard individually, potentially producing incorrect results.
  2. Using JOIN with a distributed subquery without GLOBAL -- Similarly, SELECT ... FROM dist_table JOIN another_dist_table ON ... can produce wrong results because each shard only sees its local data in the subquery.
  3. The setting distributed_product_mode is set to 'deny' -- This is the default and safest setting, which blocks these queries outright rather than silently producing wrong results.
  4. ORM or query builder generating standard SQL -- Automated query generators may not be aware of ClickHouse's GLOBAL keyword requirement.

Troubleshooting and Resolution Steps

  1. Add the GLOBAL keyword to your subquery. This is the recommended fix. GLOBAL causes the subquery to execute once on the initiator node, and the result is then broadcast to all shards:

    -- Instead of this (will fail):
    SELECT * FROM distributed_orders
    WHERE user_id IN (SELECT user_id FROM distributed_users WHERE country = 'US');
    
    -- Use this:
    SELECT * FROM distributed_orders
    WHERE user_id GLOBAL IN (SELECT user_id FROM distributed_users WHERE country = 'US');
    
  2. For JOIN queries, apply the same pattern:

    -- Instead of this:
    SELECT o.*, u.name
    FROM distributed_orders o
    JOIN distributed_users u ON o.user_id = u.user_id;
    
    -- Use this:
    SELECT o.*, u.name
    FROM distributed_orders o
    GLOBAL JOIN distributed_users u ON o.user_id = u.user_id;
    
  3. Alternatively, change the distributed_product_mode setting if you want ClickHouse to handle this automatically:

    -- Automatically rewrite IN/JOIN to GLOBAL IN/GLOBAL JOIN:
    SET distributed_product_mode = 'global';
    
    -- Or allow local execution (use with caution -- may produce wrong results):
    SET distributed_product_mode = 'local';
    

    The global option is generally safe; the local option should only be used when you are certain the subquery data exists identically on every shard.

  4. Consider materializing the subquery result into a temporary table if the GLOBAL subquery result set is very large:

    CREATE TEMPORARY TABLE tmp_users AS
    SELECT user_id FROM distributed_users WHERE country = 'US';
    
    SELECT * FROM distributed_orders
    WHERE user_id IN (SELECT user_id FROM tmp_users);
    
  5. Review your application's query generation layer to ensure it adds GLOBAL where appropriate when targeting distributed tables.

Best Practices

  • Always use GLOBAL IN and GLOBAL JOIN when both the outer query and subquery reference distributed tables.
  • Be mindful that GLOBAL subqueries materialize their result on the initiator node and broadcast it to shards -- keep the subquery result set reasonably small to avoid excessive memory usage and network transfer.
  • Set distributed_product_mode = 'global' at the user or profile level if your application frequently constructs these queries and you want automatic handling.
  • For very large subquery results, consider using dictionaries or pre-joined tables instead of GLOBAL JOIN.
  • Test distributed queries against multi-shard clusters during development -- errors like this only manifest when data is actually distributed.

Frequently Asked Questions

Q: Why does ClickHouse deny IN/JOIN subqueries on distributed tables by default?
A: Without the GLOBAL keyword, ClickHouse sends the full query (including the subquery) to each shard. Each shard then executes the subquery against its own local data only. If the subquery references a distributed table, each shard sees a different subset of the data, leading to incorrect or incomplete results. The deny mode prevents this silently wrong behavior.

Q: What is the performance impact of GLOBAL IN/JOIN?
A: The subquery executes once on the initiator node, and the full result set is sent to every shard. If the result set is small (thousands to low millions of rows), this is efficient. If it is very large, it can consume significant memory on the initiator and network bandwidth to each shard.

Q: Can I use GLOBAL with non-distributed tables in the subquery?
A: You can, but it is unnecessary. If the subquery references a local (non-distributed) table, there is no ambiguity about which data it accesses, so GLOBAL has no effect. The issue only arises when the subquery targets a distributed table.

Q: What does distributed_product_mode = 'local' actually do?
A: It allows the subquery to execute locally on each shard without the GLOBAL keyword. This is only correct if the subquery table's data is fully replicated on every shard (e.g., a small lookup table). If the data is sharded, the results will be wrong.

Q: Is there a way to detect these issues before they hit production?
A: Yes. Run your query suite against a test cluster with at least two shards and verify the results. Queries that work correctly on a single node may produce wrong results or errors on a multi-shard setup.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.