Elasticsearch cluster status is yellow (unassigned replicas)

Brief Explanation

A yellow cluster health status in Elasticsearch indicates that while all primary shards are allocated, one or more replica shards remain unassigned. This results in a yellow cluster health status, which means the cluster is functional but not operating at full redundancy.

Impact

While a yellow status doesn't prevent the cluster from functioning, it does reduce fault tolerance and can impact performance:

Reduced redundancy: Unassigned replicas mean less data redundancy, increasing the risk of data loss if a node fails.
Slower search performance: Fewer replicas can lead to increased query load on primary shards.
Potential for delayed indexing: If replica allocation is delayed, it may cause a backlog in indexing operations.

Common Causes

Insufficient nodes in the cluster to host all replicas
Node failures or network issues
Misconfigured shard allocation settings
Insufficient disk space on one or more nodes
Incompatible shard and node configurations

Troubleshooting and Resolution Steps

Check cluster health:
```
GET /_cluster/health
```

Identify unassigned shards:

GET /_cat/shards?v&h=index,shard,prirep,state,unassigned.reason

Analyze unassigned reasons:
```
GET /_cluster/allocation/explain
```
Ensure sufficient nodes are available and healthy.

Verify disk space on all nodes:

GET /_cat/nodes?v&h=ip,node.role,name,disk.total,disk.used,disk.avail,disk.used_percent

Check and adjust shard allocation settings if necessary:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

If disk space is an issue, consider increasing storage or removing unnecessary indices.
For persistent issues, consider reducing the number of replicas temporarily:
```
PUT /index_name/_settings
{
  "number_of_replicas": 1
}
```
Restart Elasticsearch nodes if other steps don't resolve the issue.

Best Practices

Regularly monitor cluster health and set up alerts for yellow or red statuses.
Implement proper capacity planning to ensure adequate resources for your cluster.
Use shard allocation filtering to control where shards are allocated.
Implement a robust backup strategy to mitigate risks associated with reduced redundancy.

Frequently Asked Questions

Q: Can I still use my Elasticsearch cluster when it's in a yellow state?
A: Yes, a yellow state means all primary shards are allocated and the cluster is operational. However, you'll have reduced redundancy and potentially slower performance.

Q: How long can I safely operate with a yellow cluster status?
A: While you can operate indefinitely in a yellow state, it's recommended to resolve the issue promptly to ensure full redundancy and optimal performance.

Q: Will increasing the number of nodes always resolve a yellow status?
A: Often, but not always. While adding nodes can help if the issue is resource-related, other factors like misconfiguration or network issues may require different solutions.

Q: Can a yellow status turn into a red status?
A: Yes, if conditions worsen and primary shards become unassigned, the cluster status can degrade from yellow to red.

Q: How does a yellow status affect my cluster's performance?
A: A yellow status can potentially slow down search operations due to increased load on primary shards and may impact indexing speed if replica allocation is delayed.