Elasticsearch NotMasterException: Not master - Common Causes & Fixes

Brief Explanation

The "NotMasterException: Not master" error in Elasticsearch occurs when a node that is not the current master node attempts to perform an operation that only the master node is allowed to execute. This error is part of Elasticsearch's cluster coordination mechanism to ensure data consistency and proper cluster management.

Impact

This error can have significant impact on cluster operations:

  • Cluster state updates may fail
  • Index creation or deletion operations might be unsuccessful
  • Shard allocation and relocation could be disrupted
  • Overall cluster stability and performance may be affected

Common Causes

  1. Network issues causing node communication problems
  2. Cluster reconfiguration or node restarts
  3. Split-brain scenarios where multiple nodes believe they are the master
  4. Misconfigured discovery settings
  5. Overloaded master node unable to respond in time

Troubleshooting and Resolution Steps

  1. Check cluster health:

    GET /_cluster/health
    
  2. Verify the current master node:

    GET /_cat/master?v
    
  3. Inspect cluster state for any ongoing changes:

    GET /_cluster/state
    
  4. Review logs on all nodes for any connectivity issues or election problems.

  5. Ensure all nodes have consistent network settings and can communicate with each other.

  6. Validate cluster settings:

    • Review the `cluster.initial_master_nodes` setting to ensure it's correctly configured.
    • Check the discovery.seed_hosts setting for proper node discovery.
  7. If the issue persists, restart the problematic node(s) one at a time.

  8. In case of a split-brain scenario, manually identify the correct master node and restart other nodes to rejoin the cluster.

Best Practices

  • Implement a proper master node election strategy with dedicated master-eligible nodes.
  • Use the minimum_master_nodes setting (for versions before 7.0) or cluster.initial_master_nodes (for 7.0+) to prevent split-brain scenarios.
  • Regularly monitor cluster health and node statuses.
  • Implement proper network segmentation and firewall rules to ensure stable node communication.
  • Use rolling restarts when updating cluster configuration to minimize disruption.

Frequently Asked Questions

Q: Can this error occur in a single-node Elasticsearch cluster?
A: It's unlikely in a single-node setup, as the sole node is always the master. However, if the node is misconfigured or unstable, it might temporarily lose its master status, potentially triggering this error.

Q: How does Elasticsearch elect a master node?
A: Elasticsearch uses a process called "master election" where eligible nodes communicate to decide on a master. The node with the lowest node ID among the eligible candidates typically becomes the master.

Q: What's the difference between master-eligible and data nodes?
A: Master-eligible nodes can participate in master elections and perform cluster-wide actions. Data nodes store and process data. A node can be both, but in larger clusters, it's often beneficial to have dedicated master-eligible nodes.

Q: How can I prevent split-brain scenarios in Elasticsearch?
A: Use the cluster.initial_master_nodes setting in Elasticsearch 7.0+ or minimum_master_nodes in earlier versions. Set this to (n/2) + 1, where n is the number of master-eligible nodes.

Q: Can changing network settings cause this error?
A: Yes, network configuration changes can disrupt node communication, potentially causing nodes to incorrectly believe they're not part of the cluster or not recognize the current master, leading to this error.

Pulse - Elasticsearch Operations Done Right
Free Health Assessment

Need more help with your cluster?

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.