java.net.SocketTimeoutException: Read timed out (or <n> milliseconds timeout on connection) is raised by an Elasticsearch client when an established HTTP/transport connection does not receive bytes from the server within the configured socket timeout. The client request fails; the server-side operation may still complete (Elasticsearch does not know the client gave up unless the task is cancellable). This is a client-side timeout, not a cluster failure.
What This Error Means
A SocketTimeoutException (java.net.SocketTimeoutException) is distinct from a ConnectException ("connection refused", server unreachable) or a ConnectTimeoutException ("connect timed out", initial TCP handshake too slow). The connection is established, bytes are flowing intermittently or not at all, and the client's read timeout expired. The default socket timeout in the Java REST client is 30000 ms (30 seconds); other clients vary.
The fix depends on whether the request actually needs that long: optimize the query, raise the timeout, or migrate to async search.
Common Causes
- Slow query taking longer than the client's socket timeout. How to confirm: enable slow logs (
index.search.slowlog.threshold.query.warn) and match timestamps. - Coordinator node CPU-bound under high concurrency. How to confirm:
GET _nodes/stats/jvm,thread_poolshows nonzerosearch.rejectedor sustained heap pressure. - Network blip or congested path between client and cluster. How to confirm: client logs show timeouts coinciding with packet-loss spikes; ping/traceroute or VPC flow logs from the same window.
- Bulk indexing batch too large for the cluster to ack within the timeout. How to confirm: per-request size in client logs is in tens of MB; reduce batch size.
- Default 30s timeout used unchanged for long-running operations (reindex, snapshot, expensive aggregations). How to confirm: client uses Java REST client defaults; operation is one of the long-running APIs.
How to Fix SocketTimeoutException
Identify the slow operation. Use the slow log and
_tasks?detailed=trueto find which queries are pushing past the timeout.Increase the client socket timeout to a value that comfortably accommodates the operation. For the Java REST client:
RestClientBuilder builder = RestClient.builder(hosts) .setRequestConfigCallback(rc -> rc .setConnectTimeout(5000) .setSocketTimeout(60000));For the Python client (
elasticsearch-py8.x):client = Elasticsearch("https://es:9200", request_timeout=60)Use async search for long queries so timeouts are not a concern:
POST <index>/_async_search?wait_for_completion_timeout=2s&keep_alive=1hFor bulk indexing, reduce batch size to 5-15 MB or 1000-5000 documents per request, and use
?refresh=false.Profile the query to find expensive parts:
POST <index>/_search { "profile": true, "query": {...} }Reduce parallel concurrency if coordinator node CPU is saturated. Drop client thread pool size; the cluster will recover.
Check network path with a
curl --connect-timeout 5 --max-time 60from the client host to the cluster; correlate with VPC flow logs or proxy access logs for packet drops.
Resolve SocketTimeoutException Automatically with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. When java.net.SocketTimeoutException: Read timed out fires from a client, Pulse:
- Correlates the failing request with
index.search.slowlog.threshold.query.warnentries,_tasks?detailed=truerunning tasks,_nodes/stats/jvm,thread_pool(looking for nonzerosearch.rejectedor sustainedheap_used_percent), and client-sidecurl --connect-timeoutprobes plus VPC/proxy flow logs from the same window - Identifies which of the five causes applies: a slow query exceeding the client's 30000 ms default
socketTimeout, coordinator CPU saturation, network blip on the path between client and cluster, oversized bulk batch (tens of MB), or default timeout used unchanged on a long-running API like reindex - Generates the exact remediation: the
RestClientBuilder.setRequestConfigCallbacksnippet withsetSocketTimeout(60000), the_async_search?wait_for_completion_timeout=2s&keep_alive=1hmigration for queries over 30s, the 5-15 MB or 1000-5000-doc batch-size guidance, or the coordinator scale-out plan whensearch.rejectedis the actual cause - Applies dynamic cluster setting changes with operator approval; leaves client timeout adjustments and async-search migrations as one-click PRs
Pulse distinguishes slow-query causes from network-path causes - the difference between "raise the timeout" and "fix the VPC peering" - so the remediation lands in the right team's queue.
Start a free trial to connect your cluster.
Frequently Asked Questions
Q: What is the default socket timeout for the Elasticsearch Java REST client?
A: The default socketTimeout in the Java REST client is 30000 ms (30 seconds). The default connectTimeout is 1000 ms. Both can be set via RestClientBuilder.setRequestConfigCallback.
Q: How is SocketTimeoutException different from ConnectionTimeoutException?
A: ConnectException/ConnectTimeoutException is raised when the initial TCP handshake or TLS handshake cannot complete. SocketTimeoutException is raised after the connection is established when bytes do not arrive within the read timeout. The former points at reachability; the latter at slow processing or network congestion.
Q: Does Elasticsearch keep running the query after the client times out?
A: Cancellable APIs (search, reindex, etc., since 7.4) detect the client disconnect and cancel the task. Non-cancellable operations continue to completion on the server even after the client gives up.
Q: Should I increase server-side timeouts to avoid SocketTimeoutException?
A: The timeout is on the client side; raising it client-side is the direct fix. Server-side, you can set search.default_search_timeout to enforce a maximum, but this caps - it does not extend - operation duration.
Q: What is the right socket timeout for production?
A: There is no universal value. Set per-operation: 5-10 seconds for typical user-facing searches, 60-120 seconds for batch reindex or expensive aggregations, no timeout for snapshot operations (use async patterns). Always be deliberate.
Q: Why does the same query sometimes time out and sometimes succeed?
A: Latency on Elasticsearch correlates with cache state, current segment merges, and concurrent load. A cold-cache query can be 10x slower than a warm one. Pre-warm via filter cache or accept the variance with appropriate timeouts.
Q: What's the fastest way to diagnose SocketTimeoutException in production?
A: Pulse, the AI DBA for Elasticsearch and OpenSearch, correlates the client timeout with the matching slow-log entry, the running task, thread-pool rejection state, and network-path signals, then distinguishes a slow query from a network blip. It proposes either the timeout adjustment, the async-search migration, or the scale-out plan and routes the change to the right repo.
Related Reading
- Elasticsearch Timeout exception: broader timeout discussion.
- Elasticsearch TaskCancelledException: when server cancels tasks.
- Elasticsearch No alive nodes found in cluster: related connectivity errors.
- Elasticsearch cluster health check: cluster-side latency diagnostics.
- Elasticsearch monitoring: client-perspective latency monitoring.