Brief Explanation
The "NotMasterException: Not master" error in Elasticsearch occurs when a node that is not the current master node attempts to perform an operation that only the master node is allowed to execute. This error is part of Elasticsearch's cluster coordination mechanism to ensure data consistency and proper cluster management.
Impact
This error can have significant impact on cluster operations:
- Cluster state updates may fail
- Index creation or deletion operations might be unsuccessful
- Shard allocation and relocation could be disrupted
- Overall cluster stability and performance may be affected
Common Causes
- Network issues causing node communication problems
- Cluster reconfiguration or node restarts
- Split-brain scenarios where multiple nodes believe they are the master
- Misconfigured discovery settings
- Overloaded master node unable to respond in time
Troubleshooting and Resolution Steps
Check cluster health:
GET /_cluster/health
Verify the current master node:
GET /_cat/master?v
Inspect cluster state for any ongoing changes:
GET /_cluster/state
Review logs on all nodes for any connectivity issues or election problems.
Ensure all nodes have consistent network settings and can communicate with each other.
Validate cluster settings:
- Review the `cluster.initial_master_nodes` setting to ensure it's correctly configured.
- Check the
discovery.seed_hosts
setting for proper node discovery.
If the issue persists, restart the problematic node(s) one at a time.
In case of a split-brain scenario, manually identify the correct master node and restart other nodes to rejoin the cluster.
Best Practices
- Implement a proper master node election strategy with dedicated master-eligible nodes.
- Use the
minimum_master_nodes
setting (for versions before 7.0) orcluster.initial_master_nodes
(for 7.0+) to prevent split-brain scenarios. - Regularly monitor cluster health and node statuses.
- Implement proper network segmentation and firewall rules to ensure stable node communication.
- Use rolling restarts when updating cluster configuration to minimize disruption.
Frequently Asked Questions
Q: Can this error occur in a single-node Elasticsearch cluster?
A: It's unlikely in a single-node setup, as the sole node is always the master. However, if the node is misconfigured or unstable, it might temporarily lose its master status, potentially triggering this error.
Q: How does Elasticsearch elect a master node?
A: Elasticsearch uses a process called "master election" where eligible nodes communicate to decide on a master. The node with the lowest node ID among the eligible candidates typically becomes the master.
Q: What's the difference between master-eligible and data nodes?
A: Master-eligible nodes can participate in master elections and perform cluster-wide actions. Data nodes store and process data. A node can be both, but in larger clusters, it's often beneficial to have dedicated master-eligible nodes.
Q: How can I prevent split-brain scenarios in Elasticsearch?
A: Use the cluster.initial_master_nodes
setting in Elasticsearch 7.0+ or minimum_master_nodes
in earlier versions. Set this to (n/2) + 1, where n is the number of master-eligible nodes.
Q: Can changing network settings cause this error?
A: Yes, network configuration changes can disrupt node communication, potentially causing nodes to incorrectly believe they're not part of the cluster or not recognize the current master, leading to this error.