Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Master Not Discovered Diagnosis

The "master not discovered" error occurs when Elasticsearch nodes cannot elect or connect to a master node. This prevents cluster formation and causes nodes to be unable to join. This guide helps diagnose and resolve master discovery issues.

Understanding Master Discovery

How Master Election Works

  1. Nodes use seed hosts to find other nodes
  2. Master-eligible nodes vote to elect a master
  3. A quorum (majority) must agree on the master
  4. Once elected, the master coordinates the cluster

Common Error Messages

MasterNotDiscoveredException: master node is not discovered yet...
master not discovered or elected yet, an election requires...
master not discovered yet, this node has not previously joined...

Diagnostic Steps

Step 1: Check Node Status

GET /_cat/nodes?v&h=name,ip,node.role,master

Step 2: Check Master Status

GET /_cat/master?v

Step 3: Review Discovery Configuration

cat /etc/elasticsearch/elasticsearch.yml | grep -E "discovery|cluster.initial_master"

Step 4: Check Logs

grep -i "master\|discovery\|election" /var/log/elasticsearch/*.log | tail -50

Common Causes and Solutions

Cause 1: Incorrect Discovery Configuration

Problem: Nodes can't find each other during bootstrap

Diagnosis:

# Check elasticsearch.yml
# Are seed hosts correct?
# Is cluster.initial_master_nodes set?

Solution for Elasticsearch 7.x+:

# elasticsearch.yml
discovery.seed_hosts:
  - 192.168.1.10:9300
  - 192.168.1.11:9300
  - 192.168.1.12:9300

# Only needed for initial cluster bootstrap
cluster.initial_master_nodes:
  - node-1
  - node-2
  - node-3

Important: Remove cluster.initial_master_nodes after initial bootstrap to prevent split-brain during restarts.

Cause 2: Network Connectivity Issues

Problem: Nodes can't communicate on transport port

Diagnosis:

# From each node, test connectivity
nc -zv <other_node_ip> 9300
ping <other_node_ip>

Solutions:

  • Open port 9300-9400 in firewall
  • Check security groups (cloud environments)
  • Verify network routing

Cause 3: Insufficient Master-Eligible Nodes

Problem: Not enough nodes for quorum

Quorum calculation: (master_eligible_nodes / 2) + 1

For 3 master-eligible nodes, need 2 for quorum.

Diagnosis:

GET /_cat/nodes?v&h=name,node.role
# Look for 'm' in node.role

Solution: Ensure enough master-eligible nodes are running:

# On master-eligible nodes
node.roles: [master, data]
# Or just master for dedicated masters
node.roles: [master]

Cause 4: Split Brain Recovery

Problem: Cluster previously split, nodes have conflicting state

Diagnosis:

# Check cluster UUID in logs
grep "cluster.uuid" /var/log/elasticsearch/*.log

Solution:

  1. Stop all nodes
  2. Clear data on minority nodes if needed:
rm -rf /var/lib/elasticsearch/nodes/0/_state/*
  1. Restart master-eligible nodes first
  2. Then restart data nodes

Cause 5: DNS Resolution Failures

Problem: Hostname resolution is slow or failing

Diagnosis:

nslookup <node_hostname>
time nslookup <node_hostname>

Solution: Use IP addresses:

discovery.seed_hosts:
  - 192.168.1.10
  - 192.168.1.11
  - 192.168.1.12

Or add to /etc/hosts:

192.168.1.10 es-node-1
192.168.1.11 es-node-2
192.168.1.12 es-node-3

Cause 6: Long GC Pauses

Problem: GC pauses cause nodes to be considered dead

Diagnosis:

grep "gc\[" /var/log/elasticsearch/*.log | grep -E "[0-9]{4,}ms"

Solutions:

  • Reduce heap pressure
  • Increase discovery timeout:
cluster.fault_detection.leader_check.timeout: 30s

Cause 7: Leftover Cluster State

Problem: Node has state from different cluster

Diagnosis:

# Check for cluster UUID mismatch in logs

Solution:

# Stop Elasticsearch
systemctl stop elasticsearch

# Clear cluster state (WARNING: data loss for this node)
rm -rf /var/lib/elasticsearch/nodes/0/_state

# Restart
systemctl start elasticsearch

Bootstrap a New Cluster

For Initial Setup

# elasticsearch.yml on ALL master-eligible nodes

cluster.name: my-cluster

node.name: node-1  # Unique per node

discovery.seed_hosts:
  - 192.168.1.10
  - 192.168.1.11
  - 192.168.1.12

# Use node names that match node.name
cluster.initial_master_nodes:
  - node-1
  - node-2
  - node-3

Start Sequence

  1. Configure all master-eligible nodes
  2. Start all master-eligible nodes (roughly simultaneously)
  3. Wait for master election
  4. Start data-only nodes
  5. Remove cluster.initial_master_nodes from config

Recovery Procedures

Single Node Won't Join

  1. Check logs for specific errors
  2. Verify network connectivity
  3. Compare configuration with working nodes
  4. Clear state if necessary

Entire Cluster Down

  1. Identify which node was last master:
grep "elected-as-master" /var/log/elasticsearch/*.log
  1. Start that node first
  2. Start other master-eligible nodes
  3. Start data nodes

After Network Partition

// Check for unassigned shards
GET /_cluster/allocation/explain

// Reroute if needed
POST /_cluster/reroute?retry_failed=true

Configuration Best Practices

Production Configuration

# elasticsearch.yml

cluster.name: production-cluster
node.name: ${HOSTNAME}

# Network
network.host: 0.0.0.0
transport.port: 9300

# Discovery
discovery.seed_hosts:
  - master1.internal:9300
  - master2.internal:9300
  - master3.internal:9300

# Fault detection
cluster.fault_detection.leader_check.timeout: 30s
cluster.fault_detection.leader_check.interval: 2s
cluster.fault_detection.follower_check.timeout: 30s

Minimum Master Nodes (ES 6.x and earlier)

For older versions:

discovery.zen.minimum_master_nodes: 2  # For 3-node cluster

This is automatic in 7.x+.

Monitoring

Track Master Elections

# Count master elections in logs
grep "elected-as-master" /var/log/elasticsearch/*.log | wc -l

Alert Conditions

  • More than 1 master election per hour
  • Master not elected within 5 minutes of node startup
  • Node unable to join cluster
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.