The "master not discovered" error occurs when Elasticsearch nodes cannot elect or connect to a master node. This prevents cluster formation and causes nodes to be unable to join. This guide helps diagnose and resolve master discovery issues.
Understanding Master Discovery
How Master Election Works
- Nodes use seed hosts to find other nodes
- Master-eligible nodes vote to elect a master
- A quorum (majority) must agree on the master
- Once elected, the master coordinates the cluster
Common Error Messages
MasterNotDiscoveredException: master node is not discovered yet...
master not discovered or elected yet, an election requires...
master not discovered yet, this node has not previously joined...
Diagnostic Steps
Step 1: Check Node Status
GET /_cat/nodes?v&h=name,ip,node.role,master
Step 2: Check Master Status
GET /_cat/master?v
Step 3: Review Discovery Configuration
cat /etc/elasticsearch/elasticsearch.yml | grep -E "discovery|cluster.initial_master"
Step 4: Check Logs
grep -i "master\|discovery\|election" /var/log/elasticsearch/*.log | tail -50
Common Causes and Solutions
Cause 1: Incorrect Discovery Configuration
Problem: Nodes can't find each other during bootstrap
Diagnosis:
# Check elasticsearch.yml
# Are seed hosts correct?
# Is cluster.initial_master_nodes set?
Solution for Elasticsearch 7.x+:
# elasticsearch.yml
discovery.seed_hosts:
- 192.168.1.10:9300
- 192.168.1.11:9300
- 192.168.1.12:9300
# Only needed for initial cluster bootstrap
cluster.initial_master_nodes:
- node-1
- node-2
- node-3
Important: Remove cluster.initial_master_nodes after initial bootstrap to prevent split-brain during restarts.
Cause 2: Network Connectivity Issues
Problem: Nodes can't communicate on transport port
Diagnosis:
# From each node, test connectivity
nc -zv <other_node_ip> 9300
ping <other_node_ip>
Solutions:
- Open port 9300-9400 in firewall
- Check security groups (cloud environments)
- Verify network routing
Cause 3: Insufficient Master-Eligible Nodes
Problem: Not enough nodes for quorum
Quorum calculation: (master_eligible_nodes / 2) + 1
For 3 master-eligible nodes, need 2 for quorum.
Diagnosis:
GET /_cat/nodes?v&h=name,node.role
# Look for 'm' in node.role
Solution: Ensure enough master-eligible nodes are running:
# On master-eligible nodes
node.roles: [master, data]
# Or just master for dedicated masters
node.roles: [master]
Cause 4: Split Brain Recovery
Problem: Cluster previously split, nodes have conflicting state
Diagnosis:
# Check cluster UUID in logs
grep "cluster.uuid" /var/log/elasticsearch/*.log
Solution:
- Stop all nodes
- Clear data on minority nodes if needed:
rm -rf /var/lib/elasticsearch/nodes/0/_state/*
- Restart master-eligible nodes first
- Then restart data nodes
Cause 5: DNS Resolution Failures
Problem: Hostname resolution is slow or failing
Diagnosis:
nslookup <node_hostname>
time nslookup <node_hostname>
Solution: Use IP addresses:
discovery.seed_hosts:
- 192.168.1.10
- 192.168.1.11
- 192.168.1.12
Or add to /etc/hosts:
192.168.1.10 es-node-1
192.168.1.11 es-node-2
192.168.1.12 es-node-3
Cause 6: Long GC Pauses
Problem: GC pauses cause nodes to be considered dead
Diagnosis:
grep "gc\[" /var/log/elasticsearch/*.log | grep -E "[0-9]{4,}ms"
Solutions:
- Reduce heap pressure
- Increase discovery timeout:
cluster.fault_detection.leader_check.timeout: 30s
Cause 7: Leftover Cluster State
Problem: Node has state from different cluster
Diagnosis:
# Check for cluster UUID mismatch in logs
Solution:
# Stop Elasticsearch
systemctl stop elasticsearch
# Clear cluster state (WARNING: data loss for this node)
rm -rf /var/lib/elasticsearch/nodes/0/_state
# Restart
systemctl start elasticsearch
Bootstrap a New Cluster
For Initial Setup
# elasticsearch.yml on ALL master-eligible nodes
cluster.name: my-cluster
node.name: node-1 # Unique per node
discovery.seed_hosts:
- 192.168.1.10
- 192.168.1.11
- 192.168.1.12
# Use node names that match node.name
cluster.initial_master_nodes:
- node-1
- node-2
- node-3
Start Sequence
- Configure all master-eligible nodes
- Start all master-eligible nodes (roughly simultaneously)
- Wait for master election
- Start data-only nodes
- Remove
cluster.initial_master_nodesfrom config
Recovery Procedures
Single Node Won't Join
- Check logs for specific errors
- Verify network connectivity
- Compare configuration with working nodes
- Clear state if necessary
Entire Cluster Down
- Identify which node was last master:
grep "elected-as-master" /var/log/elasticsearch/*.log
- Start that node first
- Start other master-eligible nodes
- Start data nodes
After Network Partition
// Check for unassigned shards
GET /_cluster/allocation/explain
// Reroute if needed
POST /_cluster/reroute?retry_failed=true
Configuration Best Practices
Production Configuration
# elasticsearch.yml
cluster.name: production-cluster
node.name: ${HOSTNAME}
# Network
network.host: 0.0.0.0
transport.port: 9300
# Discovery
discovery.seed_hosts:
- master1.internal:9300
- master2.internal:9300
- master3.internal:9300
# Fault detection
cluster.fault_detection.leader_check.timeout: 30s
cluster.fault_detection.leader_check.interval: 2s
cluster.fault_detection.follower_check.timeout: 30s
Minimum Master Nodes (ES 6.x and earlier)
For older versions:
discovery.zen.minimum_master_nodes: 2 # For 3-node cluster
This is automatic in 7.x+.
Monitoring
Track Master Elections
# Count master elections in logs
grep "elected-as-master" /var/log/elasticsearch/*.log | wc -l
Alert Conditions
- More than 1 master election per hour
- Master not elected within 5 minutes of node startup
- Node unable to join cluster