NoNodeAvailableException: No alive nodes found in your cluster (Java client) or its equivalent in other clients is raised when an Elasticsearch client cannot reach any of the hosts in its configured node list. The client gives up after exhausting retries to every known endpoint. The cluster itself may be healthy - the failure is between the client and the cluster, not inside it.
What This Error Means
Elasticsearch clients maintain a list of candidate nodes (either statically configured or discovered via sniffing) and round-robin requests across them. When every node returns a connection error, times out, or fails authentication, the client marks them all dead and raises this exception. The root cause is one of: nodes not running, network unreachability, wrong port/protocol, TLS mismatch, authentication failure, or a stale sniffed list pointing at decommissioned hosts.
The cluster may be fully operational - check from another host before assuming the cluster is down.
Common Causes
- Elasticsearch process not running on the target hosts. How to confirm:
systemctl status elasticsearchorcurl http://<host>:9200/directly from the client host. - Wrong scheme (HTTP vs HTTPS) after enabling security. How to confirm: try both
http://<host>:9200/andhttps://<host>:9200/from the client - one will return JSON, the other a TLS handshake error or empty response. - TLS certificate not trusted by the client. How to confirm:
openssl s_client -connect <host>:9200 -showcertsand verify the certificate chain. - Firewall, security group, or network policy blocking port 9200 (REST) or 9300 (transport). How to confirm:
nc -vz <host> 9200from the client host. - Sniffing returned
9300-style transport addresses but the client connects via REST. How to confirm: the client log will list the unreachable addresses; if they are9300, sniffing is misconfigured. - Authentication failure (wrong username/password/API key) returning
401on every node. How to confirm: try a manualcurl -u <user>:<pass> https://<host>:9200/.
How to Fix No Alive Nodes Found
From the client host, confirm at least one node responds:
curl -k -u elastic:<password> https://<host>:9200/A
200with the cluster banner means the node and credentials are fine.Check the running Elasticsearch process on each node:
sudo systemctl status elasticsearch sudo journalctl -u elasticsearch -n 200Verify cluster health from one of the nodes:
curl -k -u elastic:<password> https://localhost:9200/_cluster/healthReconcile client config: scheme, port, credentials, and CA cert path must match what the cluster runs. For the Java REST client:
RestClient.builder(new HttpHost("es-1", 9200, "https")) .setHttpClientConfigCallback(b -> b.setSSLContext(sslContext) .setDefaultCredentialsProvider(credentialsProvider));Disable sniffing temporarily to rule out stale topology data. In the Java client, remove the
Snifferuntil basic connectivity works.Open firewall/security groups for port
9200from the client subnet:sudo ufw allow from <client_subnet> to any port 9200 proto tcpRestart Elasticsearch only after configuration checks pass:
sudo systemctl restart elasticsearch
Resolve No Alive Nodes Found Automatically with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. When NoNodeAvailableException: No alive nodes found in your cluster fires from a client, Pulse:
- Probes both REST (
9200) and transport (9300) endpoints continuously from outside the cluster, distinguishing connection refused, TLS handshake failure,401 Unauthorized, and timeout, and correlates each withsystemctl status elasticsearch,journalctl -u elasticsearch, and_cluster/healthfrom a known-good vantage point - Identifies which of the six causes applies: node process down, HTTP-vs-HTTPS scheme mismatch after enabling security, untrusted TLS chain, blocked firewall/security group on
9200, sniffing returning unreachable9300transport addresses, or auth failure across every endpoint - Generates the exact remediation: the correct
RestClient.builderconfiguration withHttpHostscheme/port/CA, theufw/security-group rule to open9200from the client subnet, theSnifferdisable, or the certificate trust update - Applies network policy and dynamic cluster settings changes with operator approval; leaves client library configuration as a one-click PR
Pulse separates "the cluster is down" from "this client cannot reach the cluster" by running probes from outside the cluster network, which catches partial outages where some clients still work and others do not.
Start a free trial to connect your cluster.
Frequently Asked Questions
Q: Does "No alive nodes found" mean my cluster is down?
A: Not necessarily. The error means the client cannot reach any node. The cluster may be fully operational - test with curl from a known-good host before troubleshooting cluster state.
Q: How can I tell if it is a network or authentication issue?
A: A network error in the client log typically reads "connection refused" or "connect timed out"; an auth failure reads "401 Unauthorized" or "security_exception". Test with curl -v to see the HTTP status code directly.
Q: Why does sniffing make this error worse?
A: Sniffing replaces the configured node list with addresses returned by _nodes/_all/http. If those addresses are not reachable from the client network (private IPs, deprecated hosts, internal DNS names), the client appears stuck even when a publicly reachable load balancer is configured.
Q: Can I lose data when this error occurs?
A: The client error itself does not affect cluster data. Write operations that never reached a node simply fail and should be retried. Operations that returned 200 were persisted - check the client logs for the per-request status.
Q: How long should I wait before retrying after this error?
A: Use exponential backoff starting at 1-2 seconds, capped at ~30 seconds, with jitter. Aggressive retries during a network blip make recovery harder for the cluster.
Q: Will mismatched cluster names cause this error?
A: The REST client does not enforce cluster name matching, so it would not cause "no alive nodes". The transport client (deprecated since 7.0, removed in 8.0) did, but is no longer supported. If you are still using it, migrate to the REST client.
Q: What's the fastest way to diagnose "No alive nodes found" in production?
A: Pulse, the AI DBA for Elasticsearch and OpenSearch, probes the cluster from outside the network on both REST and transport, distinguishes connection refused, TLS handshake failure, and 401 in the same view, and names whether the issue is node state, firewall, scheme mismatch, sniffing, or credentials. That isolates client-network problems from actual cluster outages without a manual curl-from-three-hosts walk.
Related Reading
- Elasticsearch Connection refused error: the OS-level connectivity error.
- Elasticsearch SocketTimeoutException: for established-connection timeouts.
- Elasticsearch cluster health check: server-side health diagnostics.
- Elasticsearch UnknownServiceException: for protocol/service mismatch errors.
- Elasticsearch monitoring: proactive reachability monitoring.