Elasticsearch NotMasterException: Not master

Brief Explanation

The "NotMasterException: Not master" error in Elasticsearch occurs when a node that is not the current master node attempts to perform an operation that only the master node is allowed to execute. This error is part of Elasticsearch's cluster coordination mechanism to ensure data consistency and proper cluster management.

Impact

This error can have significant impact on cluster operations:

Cluster state updates may fail
Index creation or deletion operations might be unsuccessful
Shard allocation and relocation could be disrupted
Overall cluster stability and performance may be affected

Common Causes

Network issues causing node communication problems
Cluster reconfiguration or node restarts
Split-brain scenarios where multiple nodes believe they are the master
Misconfigured discovery settings
Overloaded master node unable to respond in time

Troubleshooting and Resolution Steps

Check cluster health:
```
GET /_cluster/health
```
Verify the current master node:
```
GET /_cat/master?v
```
Inspect cluster state for any ongoing changes:
```
GET /_cluster/state
```
Review logs on all nodes for any connectivity issues or election problems.
Ensure all nodes have consistent network settings and can communicate with each other.
Validate cluster settings:
- Review the `cluster.initial_master_nodes` setting to ensure it's correctly configured.
- Check the discovery.seed_hosts setting for proper node discovery.
If the issue persists, restart the problematic node(s) one at a time.
In case of a split-brain scenario, manually identify the correct master node and restart other nodes to rejoin the cluster.

Best Practices

Implement a proper master node election strategy with dedicated master-eligible nodes.
Use the minimum_master_nodes setting (for versions before 7.0) or cluster.initial_master_nodes (for 7.0+) to prevent split-brain scenarios.
Regularly monitor cluster health and node statuses.
Implement proper network segmentation and firewall rules to ensure stable node communication.
Use rolling restarts when updating cluster configuration to minimize disruption.

Frequently Asked Questions

Q: Can this error occur in a single-node Elasticsearch cluster?
A: It's unlikely in a single-node setup, as the sole node is always the master. However, if the node is misconfigured or unstable, it might temporarily lose its master status, potentially triggering this error.

Q: How does Elasticsearch elect a master node?
A: Elasticsearch uses a process called "master election" where eligible nodes communicate to decide on a master. The node with the lowest node ID among the eligible candidates typically becomes the master.

Q: What's the difference between master-eligible and data nodes?
A: Master-eligible nodes can participate in master elections and perform cluster-wide actions. Data nodes store and process data. A node can be both, but in larger clusters, it's often beneficial to have dedicated master-eligible nodes.

Q: How can I prevent split-brain scenarios in Elasticsearch?
A: Use the cluster.initial_master_nodes setting in Elasticsearch 7.0+ or minimum_master_nodes in earlier versions. Set this to (n/2) + 1, where n is the number of master-eligible nodes.

Q: Can changing network settings cause this error?
A: Yes, network configuration changes can disrupt node communication, potentially causing nodes to incorrectly believe they're not part of the cluster or not recognize the current master, leading to this error.