Code: 209. DB::NetException: Timeout exceeded while reading from socket (... 300000 ms) or Timeout exceeded while receiving data from client. This ClickHouse error fires when a TCP read on the native or interserver connection blocks longer than the configured receive_timeout (default 300 seconds). It is distinct from max_execution_time, which kills the query itself with a different error - code 209 is purely a network-level wait. The query may still complete on the server even after the client times out.
What This Error Means
ClickHouse imposes timeouts at three different layers: socket-level (receive_timeout, send_timeout, default 300s each), per-query execution (max_execution_time, default 0 = unlimited), and HTTP-level (http_receive_timeout, http_send_timeout, default 30s for / and 1800s for /play). Code 209 (SOCKET_TIMEOUT) maps to the socket layer - either the native TCP read or the inter-server hop between replicas/shards exceeded its wait window.
The most common situations are: a long-running query whose result block did not arrive within receive_timeout (the server is still working but the client gave up); a distributed sub-query where one shard is slow and the coordinator's read from that shard times out; or an INSERT from a slow producer whose silence on the socket exceeds the server-side receive_timeout. In all three the underlying problem is "data did not flow within the deadline" - not "the query failed."
Common Causes
- A slow
SELECTwhose result blocks take longer to compute thanreceive_timeout. Confirm withsystem.query_logafter the timeout - the query may still beQueryStartwithout a matchingQueryFinish. - A distributed query where one shard is overloaded. Confirm with
system.processeson each shard - one will show an elapsed time matching the timeout. - Network packet loss or path MTU issues between client and server, dropping reads silently. Confirm with
tcpdump,mtr, or by checkingLast_IO_Errorstyle metrics in your proxy. - An HTTP client hitting the 30-second
http_receive_timeouton/rather than the 300-second native TCPreceive_timeout. Confirm by checking which port the client connected to (8123 vs 9000) and the message wording. - A streaming
INSERTfrom a producer that pauses longer thanreceive_timeoutbetween blocks. Confirm by tracing the producer's flush cadence. - Inter-replica replication fetch timing out while pulling parts. Confirm with
last_exceptioninsystem.replication_queue.
How to Fix Read Timeout
Identify the timeout layer. Look at the message -
from socketis native TCP;while receiving data from clientis server-sidereceive_timeout; HTTP error pages with 504 are HTTP timeouts.Check whether the query is still running:
SELECT query_id, elapsed, user, query FROM system.processes WHERE elapsed > 30 ORDER BY elapsed DESC;Raise the client timeout if the query genuinely needs longer than 300 seconds. For
clickhouse-client:clickhouse-client --receive_timeout 1800 --send_timeout 1800 --query="..."For JDBC:
socket_timeout=1800000(ms). Forclickhouse-driver:connect_timeout=10, send_receive_timeout=1800.Cap the query with max_execution_time so it fails cleanly server-side rather than orphaning a long query when the client gives up:
SET max_execution_time = 600; -- 10 minutesOptimize the slow query. Use
EXPLAIN PIPELINEto find blocking operators; add a partition filter, reduce join size, or push aggregation into AggregatingMergeTree. Theread_rowsandread_bytescolumns insystem.query_logshow whether the query scanned more than necessary.Investigate distributed-query slowness. Use
system.clustersto confirm shard health and run the same query against each shard's local table to find the laggard.For HTTP clients: switch to the native TCP protocol (port 9000) or increase HTTP timeouts in
config.xml:<http_receive_timeout>1800</http_receive_timeout> <http_send_timeout>1800</http_send_timeout>
Root-Cause Analysis
To find which queries are timing out and why, correlate client errors with the server's query log:
-- Queries that started but never finished in the last day - likely client timeouts
SELECT q1.query_id, q1.user, q1.event_time AS started, q1.query
FROM system.query_log q1
WHERE q1.event_date >= today() - 1 AND q1.type = 'QueryStart'
AND NOT exists(
SELECT 1 FROM system.query_log q2
WHERE q2.query_id = q1.query_id AND q2.type IN ('QueryFinish', 'ExceptionWhileProcessing')
)
ORDER BY started DESC LIMIT 50;
-- Slowest finished queries (potential next-timeout candidates)
SELECT query_duration_ms, read_rows, memory_usage, query
FROM system.query_log
WHERE event_date = today() AND type = 'QueryFinish'
ORDER BY query_duration_ms DESC LIMIT 20;
Preventive Measures
- Always set
max_execution_timeon user-facing query paths. Without it, a slow query stays alive on the server long after the client has timed out, occupying a query slot and risking a too many simultaneous queries error. - Configure client timeouts longer than
max_execution_timeso the server is the one that decides whether to kill a query. - Monitor
system.metric_logforCurrentMetric_TCPConnectionandCurrentMetric_HTTPConnectionto catch connection storms before they cause queueing-induced timeouts. - Watch
system.eventsforNetworkReceiveElapsedMicrosecondsandNetworkSendElapsedMicroseconds- sustained growth signals network or upstream issues. - For distributed clusters, enforce
distributed_connections_pool_sizeandconnect_timeout_with_failover_msto fail fast on a dead shard rather than waiting the full 300 seconds.
Resolve Code 209 SOCKET_TIMEOUT Automatically with Pulse
Pulse is an AI DBA for ClickHouse (and Kafka and Elasticsearch). When Code: 209. DB::NetException: Timeout exceeded while reading from socket fires in your environment, the underlying cause can be socket-level (receive_timeout, default 300s), execution-level (max_execution_time), or HTTP-level (http_receive_timeout) - Pulse:
- Continuously tracks per-query
query_duration_msfromsystem.query_log, orphanedQueryStartrows without a matchingQueryFinish, and the three timeout layers (receive_timeout,send_timeout,http_receive_timeout,max_execution_time) - Correlates client-side 209 errors with the corresponding
query_idserver-side, per-replicaCurrentMetric_Querysaturation,NetworkReceiveElapsedMicroseconds/NetworkSendElapsedMicrosecondstrends, and distributed sub-query elapsed time across shards - Identifies which of the six causes above applies - slow SELECT, overloaded shard in a distributed query, network packet loss, HTTP-vs-native protocol mismatch on port 8123 vs 9000, streaming INSERT pause, or inter-replica replication fetch lag
- Recommends the precise fix - raise
receive_timeout/send_timeouton the client, setmax_execution_time = 600server-side, add a partition filter or route via AggregatingMergeTree, or tunedistributed_connections_pool_sizeandconnect_timeout_with_failover_ms - Applies low-risk fixes automatically with your approval (rerouting traffic away from a saturated replica while diagnostics run) or generates a one-click config PR
Pulse turns the manual orphan-query and per-shard triage above into an agentic SRE workflow. Start a free trial.
Frequently Asked Questions
Q: What is the fastest way to diagnose Code 209 read timeouts in production ClickHouse?
A: First identify the timeout layer from the wording - from socket is native TCP, while receiving data from client is server-side receive_timeout, and HTTP 504 is the HTTP layer. Then check system.processes for the orphaned query_id still running server-side. For continuous coverage, Pulse is an AI DBA for ClickHouse that correlates client 209 errors with orphan queries in system.query_log, per-replica saturation, and network elapsed-time metrics, and recommends whether to raise client timeouts, set max_execution_time, or rewrite the query.
Q: What does "DB::Exception: Read timeout" mean in ClickHouse?
A: It means a TCP read between the client and ClickHouse (or between two ClickHouse servers) exceeded receive_timeout (default 300 seconds). The error code is 209 (SOCKET_TIMEOUT). It does not necessarily mean the query failed - the server may still be running it.
Q: How do I increase the read timeout in ClickHouse?
A: For clickhouse-client, pass --receive_timeout 1800 and --send_timeout 1800 (seconds). For JDBC, set socket_timeout in milliseconds. For HTTP clients, raise http_receive_timeout/http_send_timeout in config.xml. Always pair longer client timeouts with max_execution_time server-side so queries do not run forever.
Q: What is the difference between receive_timeout and max_execution_time?
A: receive_timeout is a socket-level wait - "no bytes arrived for N seconds." max_execution_time is a server-side query duration cap - "this query has been running too long." A query can hit max_execution_time and fail with Code: 159 TIMEOUT_EXCEEDED, or it can be running fine but the client's receive_timeout fires because the result block is just slow to compute.
Q: Why does my query work in clickhouse-client but time out from my application?
A: Almost always the application has a shorter default timeout. JDBC drivers default to 30s, HTTP clients often to 60s, while clickhouse-client defaults to 300s. Check and raise the application's socket_timeout (or equivalent), and confirm the application is hitting the same port (8123 HTTP vs 9000 native).
Q: Does the read timeout kill the query on the server?
A: No. A receive_timeout is a client-side or socket-side error - the server keeps running the query. To make sure the server cancels the query when the client gives up, set cancel_http_readonly_queries_on_client_close = 1 for HTTP, and rely on max_execution_time for native TCP.
Q: Can I get partial results from ClickHouse when a timeout fires?
A: Yes, with SET partial_result_on_first_cancel = 1 or by using LIMIT to bound the result. Without those, ClickHouse returns an error and discards any partial result.
Related Reading
- max_execution_time Setting - server-side query duration cap
- Memory Limit Exceeded - co-occurs when queries are slow because they spill or fail
- Too Many Simultaneous Connections - related concurrency-side issue
- Cannot Read from Socket - hard socket failure, not a timeout
- AggregatingMergeTree - precompute heavy aggregations to keep queries fast
- ClickHouse Client - configuring timeouts on the native CLI
- ClickHouse Documentation Hub - index of all ClickHouse KB pages