Elasticsearch cluster status is yellow (unassigned replicas) - Common Causes & Fixes

Brief Explanation

A yellow cluster health status in Elasticsearch indicates that while all primary shards are allocated, one or more replica shards remain unassigned. This results in a yellow cluster health status, which means the cluster is functional but not operating at full redundancy.

Impact

While a yellow status doesn't prevent the cluster from functioning, it does reduce fault tolerance and can impact performance:

  • Reduced redundancy: Unassigned replicas mean less data redundancy, increasing the risk of data loss if a node fails.
  • Slower search performance: Fewer replicas can lead to increased query load on primary shards.
  • Potential for delayed indexing: If replica allocation is delayed, it may cause a backlog in indexing operations.

Common Causes

  1. Insufficient nodes in the cluster to host all replicas
  2. Node failures or network issues
  3. Misconfigured shard allocation settings
  4. Insufficient disk space on one or more nodes
  5. Incompatible shard and node configurations

Troubleshooting and Resolution Steps

  1. Check cluster health:

    GET /_cluster/health
    
  2. Identify unassigned shards:

    GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason
    
  3. Analyze unassigned reasons:

    GET /_cluster/allocation/explain
    
  4. Ensure sufficient nodes are available and healthy.

  5. Verify disk space on all nodes:

    GET /_cat/nodes?v&h=ip,node.role,name,disk.total,disk.used,disk.avail,disk.used_percent
    
  6. Check and adjust shard allocation settings if necessary:

    PUT /_cluster/settings
    {
      "transient": {
        "cluster.routing.allocation.enable": "all"
      }
    }
    
  7. If disk space is an issue, consider increasing storage or removing unnecessary indices.

  8. For persistent issues, consider reducing the number of replicas temporarily:

    PUT /index_name/_settings
    {
      "number_of_replicas": 1
    }
    
  9. Restart Elasticsearch nodes if other steps don't resolve the issue.

Best Practices

  • Regularly monitor cluster health and set up alerts for yellow or red statuses.
  • Implement proper capacity planning to ensure adequate resources for your cluster.
  • Use shard allocation filtering to control where shards are allocated.
  • Implement a robust backup strategy to mitigate risks associated with reduced redundancy.

Frequently Asked Questions

Q: Can I still use my Elasticsearch cluster when it's in a yellow state?
A: Yes, a yellow state means all primary shards are allocated and the cluster is operational. However, you'll have reduced redundancy and potentially slower performance.

Q: How long can I safely operate with a yellow cluster status?
A: While you can operate indefinitely in a yellow state, it's recommended to resolve the issue promptly to ensure full redundancy and optimal performance.

Q: Will increasing the number of nodes always resolve a yellow status?
A: Often, but not always. While adding nodes can help if the issue is resource-related, other factors like misconfiguration or network issues may require different solutions.

Q: Can a yellow status turn into a red status?
A: Yes, if conditions worsen and primary shards become unassigned, the cluster status can degrade from yellow to red.

Q: How does a yellow status affect my cluster's performance?
A: A yellow status can potentially slow down search operations due to increased load on primary shards and may impact indexing speed if replica allocation is delayed.

Pulse - Elasticsearch Operations Done Right
Free Health Assessment

Need more help with your cluster?

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.