ElasticsearchTimeoutException (Java client) and related timeout exceptions are raised when an Elasticsearch operation does not complete within the time the client or server allows. Specific subclasses - SocketTimeoutException, ConnectTimeoutException, ReceiveTimeoutTransportException, TaskCancelledException (from per-query timeout) - distinguish the layer where the timeout fired. The cluster keeps running; only the timed-out request fails.
What This Error Means
Timeouts in Elasticsearch happen at multiple layers and the fix depends on which one fired:
| Layer | Symptom | Typical fix |
|---|---|---|
| Client socket | SocketTimeoutException: Read timed out |
Raise client socket timeout, optimize query, use async search |
| Client connect | ConnectTimeoutException |
Network / firewall / DNS / TLS |
| Search per-query | partial results with timed_out: true |
Raise timeout parameter; query may return what it had |
| Inter-node transport | ReceiveTimeoutTransportException |
Network between nodes, GC pauses, or slow shard |
| Task cancellation | TaskCancelledException |
Coordinator killed task (client disconnect / explicit cancel) |
Read the exception class first - that tells you which layer to fix.
Common Causes
- Slow query exceeding client socket timeout (most common). How to confirm: enable slow log; match the timestamp to the client failure.
- Per-query
timeoutparameter set lower than query latency. How to confirm: the search response has"timed_out": trueand partial results. - Inter-node transport delays from GC pauses or network issues. How to confirm:
GET _nodes/stats/jvm/gcshows long GC pauses; transport ping logs show delays. - Coordinator search thread pool saturated. How to confirm:
GET _nodes/stats/thread_pool/searchshows nonzerorejectedor sustained queue length. - Large bulk requests timing out at the ingest side. How to confirm: per-request size in client logs is many MB; reduce batch size.
- Connection-level timeout (TCP handshake or TLS handshake too slow). How to confirm: error class is
ConnectTimeoutException; client log shows handshake-stage failure.
How to Fix Timeout Exception
Identify the exception class. The simple name (
SocketTimeoutException,ReceiveTimeoutTransportException, etc.) tells you which layer fired:tail -f /var/log/elasticsearch/*.logFor client socket timeouts, raise the timeout deliberately and consider async search:
RestClient.builder(host).setRequestConfigCallback( rc -> rc.setSocketTimeout(60000));For per-query
timeout, decide whether you want partial results or a full retry:GET /my-index/_search?timeout=30sWith
timed_out: trueand partial data, the cluster returned what it had so far.For long-running queries, use async search:
POST <index>/_async_search?wait_for_completion_timeout=2s&keep_alive=1hOptimize slow queries. Run with
?profile=trueto see where time is spent:{ "profile": true, "query": {...} }Common gains: replace
wildcardwithkeywordterm, replacescriptfields withruntimefields, droptrack_total_hits.For inter-node ReceiveTimeoutTransportException, check GC pauses and network. Tune heap, fix slow shards, or move shards off overloaded nodes via
cluster.routing.allocation.*filters.Scale or reshard if queue rejections are persistent. Add nodes, increase replicas (to spread search load), or rollover oversized indices.
Resolve Timeout Exceptions Automatically with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. When ElasticsearchTimeoutException or its subclasses (SocketTimeoutException, ConnectTimeoutException, ReceiveTimeoutTransportException, TaskCancelledException) fire, Pulse:
- Classifies the timeout by exception class and layer (client socket, client connect, search per-query with
timed_out: true, inter-node transport, task cancellation), then correlates client latency with the slow log,_nodes/stats/thread_pool/searchrejectedcount,_nodes/stats/jvm/gcpause durations, and transport ping logs at the same timestamp - Identifies which of the six causes applies: slow query exceeding client
socketTimeout, per-query?timeout=set below latency, inter-node transport delay from GC or network, coordinator search thread pool saturation, oversized bulk batches, or TCP/TLS connect timeout - Generates the exact remediation: the
RestClient.setSocketTimeout(60000)adjustment, the?timeout=30schange with explicit "partial results vs full retry" guidance, the_async_search?wait_for_completion_timeout=2s&keep_alive=1hmigration, the?profile=trueplan for query optimization, the heap or G1GC tuning, or thecluster.routing.allocation.*move for an overloaded shard - Applies dynamic cluster settings with operator approval; leaves client timeout updates, async-search migrations, and query rewrites as one-click PRs targeting the consuming service
Pulse runs predictive alerts on rising p95/p99 latency before timeouts spike, so the question "should we add nodes or rewrite the query" has an answer before users notice the slowdown.
Start a free trial to connect your cluster.
Frequently Asked Questions
Q: What is the difference between SocketTimeoutException and ElasticsearchTimeoutException?
A: SocketTimeoutException is a JDK class raised when bytes do not arrive within the client read timeout. ElasticsearchTimeoutException is a higher-level Elasticsearch client wrapper that may wrap any of several timeout classes. The fix depends on which underlying class is wrapped - inspect the cause chain.
Q: Does "timed_out": true in a search response mean my query failed?
A: Not entirely. The server's per-query timeout is a soft limit - shards that hit it return partial results and the response is marked timed_out: true. You get whatever was already collected. To force a complete result, raise the timeout or remove it.
Q: How do I find which queries are causing timeout exceptions?
A: Enable the search slow log (index.search.slowlog.threshold.query.warn: 5s) and match timestamps to client failures. The _tasks?detailed=true API shows currently-running tasks if you can catch one in flight.
Q: Can timeout exceptions cause data loss?
A: Reads do not affect data. Writes that timed out on the client may have succeeded server-side - the cluster does not roll back partial writes. For bulk indexing, retry idempotent writes with the same _id (and op_type: create or external versioning if needed) to avoid duplicates.
Q: Why does the same query sometimes time out and sometimes succeed?
A: Latency varies with cache warmth, concurrent load, segment merges, and GC pauses. A cold-cache query can be 10x slower. Set timeouts with this variance in mind, or pre-warm caches with periodic background queries.
Q: Should I raise search.default_search_timeout to fix timeouts?
A: That setting defaults to -1 (off, no cluster-level cap) and sets a server-side maximum, not a minimum. Raising it does not help slow queries finish - only optimization or async search does. Use it to prevent runaway queries, not to fix slowness.
Q: What's the fastest way to diagnose timeout exceptions in production?
A: Pulse, the AI DBA for Elasticsearch and OpenSearch, classifies the timeout by exception class and layer, correlates with slow logs, thread-pool pressure, and GC pauses, then names whether the fix is client-side (raise timeout, async-search migration) or server-side (scale, query rewrite, heap tuning). It applies dynamic settings with approval and routes code changes to the right service.
Related Reading
- Elasticsearch SocketTimeoutException: client-side read timeouts.
- Elasticsearch TaskCancelledException: server-side task cancellation.
- Elasticsearch cluster health check: cluster-side diagnostics.
- Elasticsearch slow log configuration: identifying slow queries.
- Elasticsearch monitoring: timeout pattern detection.