Elasticsearch NoShardAvailableActionException: No shard available - Common Causes & Fixes

NoShardAvailableActionException: No shard available for [<index>][<shard>] is logged when an operation targets a primary (or, for reads, both primary and all replicas) shard that has no usable copy in the cluster. The shard is either unassigned, initializing, or all copies are on unreachable nodes. The request fails with a 503 and the affected index is partially or fully unavailable until allocation succeeds.

What This Error Means

Elasticsearch routes every search and index request to a specific shard copy. If no copy of the targeted shard is STARTED, the request cannot be served. This is almost always a symptom, not a root cause - the underlying issue is unassigned shards, and Elasticsearch ships the _cluster/allocation/explain API specifically to tell you why.

Common underlying conditions: a node hosting a primary shard left the cluster, the disk watermark was breached, replicas have not yet recovered, or an allocation filter excludes every eligible node.

Common Causes

  1. Node failure or network partition removed the shard's only copy from the cluster. How to confirm: GET _cluster/health shows reduced number_of_nodes; GET _cat/nodes?v shows missing entries.
  2. Disk high watermark exceeded; allocation paused. How to confirm: GET _cat/allocation?v - any node above 90% disk usage is the cause (high watermark default).
  3. Index replicas set to a count that exceeds available data nodes. How to confirm: GET <index>/_settings for number_of_replicas; with one data node and number_of_replicas: 1, replicas never assign.
  4. Allocation filters (include/exclude/require) match no node. How to confirm: GET _cluster/settings and the index settings for routing.allocation.*.
  5. Maximum shards per node reached (default cluster.max_shards_per_node is 1000). How to confirm: cluster log shows Validation Failed: this action would add [N] shards, but this cluster currently has [X]/[1000] maximum shards open.
  6. Shard corruption preventing recovery. How to confirm: GET _cluster/allocation/explain reports corrupted_index_uuid or checksum_failure.

How to Fix NoShardAvailableActionException

  1. Get a definitive explanation for one unassigned shard:

    GET /_cluster/allocation/explain
    {
      "index": "my-index",
      "shard": 0,
      "primary": true
    }
    

    The response's allocate_explanation and node_allocation_decisions name the exact blocker.

  2. Check cluster health and unassigned-shard counts:

    GET /_cluster/health?level=indices
    GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason
    
  3. If disk watermark is the cause: free disk, expand storage, or temporarily relax watermarks:

    PUT /_cluster/settings
    {
      "transient": {
        "cluster.routing.allocation.disk.watermark.low": "92%",
        "cluster.routing.allocation.disk.watermark.high": "95%"
      }
    }
    

    These are a stopgap - resize storage rather than leave them raised.

  4. If a node is missing, bring it back online. Check Elasticsearch logs on the absent node; restart with systemctl start elasticsearch. Elasticsearch will recover from replica or translog on its own.

  5. If allocation filters are wrong: clear or correct them:

    PUT /my-index/_settings
    {
      "index.routing.allocation.require._tier_preference": null
    }
    
  6. Retry failed allocation explicitly (e.g., after fixing disk space):

    POST /_cluster/reroute?retry_failed=true
    
  7. If the shard's only copy is corrupted, restore from snapshot or accept data loss by force-allocating an empty primary (last resort - data is lost):

    POST /_cluster/reroute
    {
      "commands": [
        { "allocate_empty_primary": { "index": "my-index", "shard": 0, "node": "data-1", "accept_data_loss": true } }
      ]
    }
    

Resolve NoShardAvailableActionException Automatically with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch. When NoShardAvailableActionException: No shard available for [<index>][<shard>] fires, Pulse:

  • Calls _cluster/allocation/explain for every affected shard, parses allocate_explanation and node_allocation_decisions, and correlates with _cat/allocation?v disk usage, _cat/nodes?v membership, _cat/shards?v&h=index,shard,prirep,state,unassigned.reason, and the master node's leaving/joining history
  • Identifies which of the six causes applies: missing node from a partition, high watermark breach at 90% disk, replica count exceeding data nodes, an index.routing.allocation.* filter matching no node, cluster.max_shards_per_node: 1000 saturation, or corruption flagged as corrupted_index_uuid
  • Generates the exact remediation payload: the POST /_cluster/reroute?retry_failed=true call, the PUT /_cluster/settings watermark adjustment, the PUT /<index>/_settings filter clear, or - as a last resort with explicit operator confirmation - the allocate_empty_primary reroute with accept_data_loss: true
  • Applies allocation setting changes and ?retry_failed=true reroutes automatically with operator approval; never force-allocates an empty primary without explicit confirmation because that path discards shard data

Pulse tracks unassigned shard count and pending_tasks continuously, alerting before a replica that cannot place becomes a primary that cannot serve.

Start a free trial to connect your cluster.

Frequently Asked Questions

Q: Why are my replicas unassigned even though the cluster has multiple nodes?
A: The most common reasons are routing.allocation.* filters that prevent the replica from being placed on any non-primary node, all eligible nodes being above the high disk watermark, or cluster.routing.allocation.same_shard.host: true blocking allocation onto the same physical host. _cluster/allocation/explain will tell you which.

Q: Is it safe to force shard allocation?
A: Forcing an empty primary with allocate_empty_primary discards all data for that shard - use only when the data is already lost. ?retry_failed=true is safe because it re-runs normal allocation logic.

Q: What is the difference between NoShardAvailableActionException and UnavailableShardsException?
A: NoShardAvailableActionException means no copy is in STARTED state for the targeted shard. UnavailableShardsException is thrown when the requested consistency level (e.g., quorum of replicas) cannot be met. They often appear together but indicate different cluster states.

Q: How long does Elasticsearch wait before retrying shard allocation?
A: Failed allocations back off exponentially. The cluster will retry up to index.allocation.max_retries times (default 5). After that, you must call POST /_cluster/reroute?retry_failed=true to retry.

Q: Can NoShardAvailableActionException occur on a green cluster?
A: Yes, briefly - during a shard relocation or initialization window, the target copy is not yet STARTED. Most clients retry transparently. Persistent occurrence on a green cluster usually points at a race between client routing decisions and master state updates.

Q: Will restarting Elasticsearch fix unassigned shards?
A: Sometimes, by triggering a fresh allocation pass. But restarting without diagnosis can mask the cause (disk pressure, filter misconfiguration) and risks data loss if it removes the only remaining copy. Always run _cluster/allocation/explain first.

Q: What's the fastest way to diagnose NoShardAvailableActionException in production?
A: Pulse, the AI DBA for Elasticsearch and OpenSearch, calls _cluster/allocation/explain for every affected shard, parses the per-node allocation decisions, and names the blocker (disk watermark, missing node, filter mismatch, corruption) in one view. It applies the safe remediation - watermark adjustment, retry reroute, filter clear - with approval and refuses to force-allocate an empty primary without explicit operator confirmation.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.