NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

ClickHouse: Too Many Simultaneous Queries / Connections (Code 202)

Code: 202. DB::Exception: Too many simultaneous queries. Maximum: 100. This ClickHouse error is raised when the count of concurrently executing queries on the server reaches max_concurrent_queries (default 100) and a new query arrives. The cluster is not out of resources - this is a deliberate admission-control limit. The result is that new query attempts are rejected with code 202 (TOO_MANY_SIMULTANEOUS_QUERIES) until existing queries finish.

What This Error Means

ClickHouse enforces two separate concurrency limits and they are easy to confuse:

  • max_concurrent_queries (default 100) caps the number of queries actively executing at any moment. When exceeded, the server returns code 202.
  • max_connections (default 4096) caps the number of open TCP connections to the server. Excess connections are dropped at accept time with a separate error - the well-known "Too many simultaneous connections" phrasing maps to this limit.

Most "too many simultaneous" errors in modern ClickHouse versions are the query variant (code 202), even when users colloquially call them "connection" errors. The signal-distinguishing line is the Maximum: 100 (queries) versus Maximum: 4096 (connections) in the error text. ClickHouse can hold thousands of idle connections while only running 100 queries at once - clients sit in a queue, not in an error state, until a query slot frees up.

Common Causes

  1. Burst traffic from a service without connection pooling, where each request opens a fresh session and runs a query. Confirm with SELECT count() FROM system.processes during a burst - it will sit at 100.
  2. Long-running queries holding slots open. Confirm with SELECT query_id, elapsed, query FROM system.processes ORDER BY elapsed DESC - the long tail (multi-minute queries) is blocking shorter ones.
  3. A misbehaving client that opens connections faster than it closes them, eventually saturating max_connections and producing the literal "Too many simultaneous connections" wording. Confirm with netstat -an | grep :9000 | grep ESTABLISHED | wc -l.
  4. Sub-queries on a Distributed table fanning out one query per shard, multiplying the per-node query count. Each shard counts independently.
  5. A BI tool with auto-refresh dashboards firing many parallel queries from many users. Confirm by grouping system.query_log by initial_user over a 60-second window.

How to Fix Too Many Simultaneous Queries

  1. Check current concurrency:

    -- Active queries
    SELECT count() FROM system.processes;
    -- Connection-level metrics
    SELECT metric, value FROM system.metrics
    WHERE metric IN ('Query', 'TCPConnection', 'HTTPConnection', 'InterserverConnection');
    
  2. Identify long runners holding slots:

    SELECT query_id, user, elapsed, memory_usage, query
    FROM system.processes
    ORDER BY elapsed DESC LIMIT 10;
    
  3. Kill blocking queries if they are misbehaving:

    KILL QUERY WHERE query_id = '<id>';
    
  4. Raise max_concurrent_queries if the hardware can support more parallel work. Edit config.xml:

    <max_concurrent_queries>200</max_concurrent_queries>
    <max_concurrent_select_queries>150</max_concurrent_select_queries>
    <max_concurrent_insert_queries>100</max_concurrent_insert_queries>
    

    Apply with SYSTEM RELOAD CONFIG. Each running query consumes max_threads cores worth of CPU, so raising this without raising core count just causes context switching.

  5. Set per-user concurrency limits via a settings profile so one tenant cannot saturate the slot pool:

    CREATE SETTINGS PROFILE tenant_a
    SETTINGS max_concurrent_queries_for_user = 20
    TO ROLE tenant_a_role;
    
  6. Implement client-side connection pooling and retries. Tools like clickhouse-driver (Python), ClickHouse-go, and JDBC drivers all support pooling; clients should retry code 202 with exponential backoff.

  7. Reduce long-running queries: enable max_execution_time per user, run heavy analytics on a dedicated replica, or precompute via AggregatingMergeTree.

Root-Cause Analysis

The signal pattern over time reveals whether this is a hardware capacity issue or a client pathology:

-- Concurrency over the last hour, per minute
SELECT toStartOfMinute(event_time) AS m,
       max(CurrentMetric_Query) AS max_concurrent_queries,
       max(CurrentMetric_TCPConnection) AS max_connections
FROM system.metric_log
WHERE event_date = today()
GROUP BY m ORDER BY m DESC LIMIT 60;

-- Users with the highest concurrent query counts
SELECT initial_user, max(toStartOfMinute(event_time)) AS m,
       count() AS queries_per_minute
FROM system.query_log
WHERE event_date >= today() - 1 AND type = 'QueryStart'
GROUP BY initial_user, toStartOfMinute(event_time)
ORDER BY queries_per_minute DESC LIMIT 20;

Preventive Measures

  • Set max_concurrent_queries_for_user per tenant via a settings profile. The default global cap is too coarse for multi-tenant clusters.
  • Set max_execution_time (30-60 seconds for interactive workloads) so runaway queries do not pin slots indefinitely.
  • Enforce client-side connection pools with capped pool sizes. A pool of 10 connections from a service replica with 100-pod horizontal scaling can saturate a 100-slot ClickHouse cluster.
  • Place a load balancer that round-robins across multiple replicas of each shard, distributing query slots evenly.
  • For internal dashboards, use materialized views and AggregatingMergeTree to reduce per-query work, freeing slots for ad-hoc analysis.

Resolve TOO_MANY_SIMULTANEOUS_QUERIES (Code 202) Automatically with Pulse

Pulse is an AI DBA for ClickHouse (and Kafka and Elasticsearch). When Code: 202. DB::Exception: Too many simultaneous queries. Maximum: 100 fires and clients start getting rejected in your environment, Pulse:

  • Continuously tracks CurrentMetric_Query and CurrentMetric_TCPConnection from system.metric_log against max_concurrent_queries (default 100) and max_connections (default 4096) - the two distinct limits the error name conflates
  • Correlates the slot saturation with long-tail queries in system.processes, per-user query rates from system.query_log, distributed sub-query fanout multiplying per-shard slot usage, and dashboard auto-refresh storms grouped by initial_user
  • Identifies which of the five causes above applies - missing client pooling, multi-minute queries pinning slots, runaway distributed fanout, dashboard bursts, or genuine under-provisioning
  • Recommends the precise fix - apply max_concurrent_queries_for_user in a settings profile, enforce max_execution_time = 30, KILL QUERY the long-tail blockers, raise max_concurrent_queries to 2 * cpu_cores, or split max_concurrent_select_queries from max_concurrent_insert_queries
  • Applies low-risk fixes automatically with your approval (capping a runaway tenant via a settings profile during a burst, rolling back when traffic normalises) or generates a one-click config PR

Pulse turns the manual system.processes triage above into an agentic SRE workflow. Start a free trial.

Frequently Asked Questions

Q: What tool can automatically detect ClickHouse slot saturation before Code 202 starts rejecting queries?
A: Pulse is an AI DBA for ClickHouse that watches CurrentMetric_Query against max_concurrent_queries, projects sustained utilization above 70% (the leading indicator for Code 202 outages), and distinguishes between the four root causes - client connection storms, long-tail slow queries, distributed fanout amplification, and dashboard-driven bursts - then recommends or applies a tighter settings profile or max_execution_time cap.

Q: What is the difference between max_concurrent_queries and max_connections in ClickHouse?
A: max_concurrent_queries (default 100) limits queries actively executing at the same time; max_connections (default 4096) limits open TCP connections. Idle connections do not consume query slots, so a cluster can hold thousands of pooled connections while only executing 100 queries concurrently. Code 202 corresponds to the query limit; the literal "Too many simultaneous connections" message corresponds to max_connections.

Q: What is a good value for max_concurrent_queries?
A: A reasonable starting point is 2 * cpu_cores, since each query uses around max_threads worth of cores. Higher values cause CPU thrashing without proportional throughput. Adjust based on observed CPU saturation and the 95th percentile of max_threads per query.

Q: How do I see how many queries are running right now on ClickHouse?
A: Run SELECT count() FROM system.processes for active queries, or SELECT value FROM system.metrics WHERE metric = 'Query' for the same number via the metrics interface. SELECT * FROM system.processes ORDER BY elapsed DESC shows the actual queries.

Q: Does connection pooling fix this error?
A: Pooling reduces churn but does not directly increase the number of concurrent queries the server allows. If your pooled clients submit 200 queries in parallel against a max_concurrent_queries = 100 server, the 101st query still gets code 202. Combine pooling with client-side retry and either raise the limit or smooth the request rate.

Q: How do I limit one user from monopolizing all query slots?
A: Apply max_concurrent_queries_for_user through a settings profile targeted at the user or role. This caps that user's share of the global slot pool without lowering capacity for others.

Q: Are inserts and selects counted against the same limit?
A: By default, both count toward max_concurrent_queries. You can split them with max_concurrent_select_queries and max_concurrent_insert_queries, which independently limit each kind. Distributed sub-queries against this server also count.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.