Elasticsearch Master Election Failure

Elasticsearch relies on a single elected master node to manage cluster state - index metadata, shard allocation, node membership. When master election fails, the cluster cannot accept new writes or perform shard rebalancing. Understanding how the election mechanism works and what breaks it is the difference between a five-minute recovery and a multi-hour incident.

How Master Election Works in Elasticsearch 7.x+

Elasticsearch 7.0 replaced the older Zen Discovery protocol with a new cluster coordination layer built on ideas from Raft, Paxos, and Viewstamped Replication. Master election is now a proper consensus problem with formal guarantees, verified against a TLA+ specification. Each election round uses an incrementing term number. A node that wants to become master first runs a pre-voting phase - a Raft-style mechanism that checks whether it has any chance of winning before starting the real election. This prevents rogue or stale nodes from disrupting a healthy cluster with unnecessary elections.

The old discovery.zen.minimum_master_nodes setting is gone. Elasticsearch now tracks which nodes belong to the voting configuration and computes quorum automatically. A quorum requires a strict majority: with three master-eligible nodes, two must agree; with five, three. The voting configuration is persisted to disk, so it survives restarts. Elections complete in under a second. When they fail, it is almost always because the cluster cannot assemble a quorum - not because the algorithm broke.

The cluster.initial_master_nodes Trap

cluster.initial_master_nodes tells Elasticsearch which nodes should participate in the very first election when bootstrapping a brand-new cluster. It solves a chicken-and-egg problem: on first startup, nodes have no persisted cluster state, so they need an explicit list to form the initial voting configuration.

The trap is leaving this setting in elasticsearch.yml after the cluster has formed. If a node with a stale cluster.initial_master_nodes value restarts and cannot reach the other listed nodes, it may attempt to bootstrap a new single-node cluster - creating a split-brain. This is one of the most common self-inflicted master election failures. The fix: remove cluster.initial_master_nodes from every node's configuration after the cluster has successfully formed. The setting has zero use after bootstrap completes.

# Only during initial bootstrap. Remove after first successful cluster formation.
cluster.initial_master_nodes:
  - master-1
  - master-2
  - master-3

What Happens When Quorum Is Lost

When the cluster loses more than half its master-eligible nodes simultaneously, it cannot elect a master. No master means no cluster state updates. Shard allocation freezes, pending tasks queue indefinitely, and any operation requiring coordination - indexing into a new primary, creating an index, changing mappings - fails with a master_not_discovered_exception. Data nodes may continue serving reads against already-allocated shards, but the cluster is operationally dead for writes.

Surviving nodes will log master not discovered or elected yet, an election requires... followed by the node IDs needed for quorum. This does not resolve itself - bring enough master-eligible nodes back online or the cluster stays stuck. Disk-level flood stage watermarks compound the problem: if surviving nodes hit the disk flood threshold, Elasticsearch applies a read_only_allow_delete block on indices, turning master election failure into a full outage.

Recovery: elasticsearch-node Tool and Voting Exclusions

If you can restart enough master-eligible nodes to restore quorum, the cluster will elect a master and resume normal operation. When failed nodes cannot come back - hardware failure, corrupted data directory, terminated cloud instances - you need the elasticsearch-node CLI tool.

The unsafe-bootstrap command forces a single remaining master-eligible node to form a new cluster on its own, bypassing quorum. Stop all surviving nodes first. Pick the master-eligible node with the most recent cluster state and run:

# Stop ALL nodes first, then on the chosen master-eligible node:
bin/elasticsearch-node unsafe-bootstrap

Start that node and verify it becomes master, then run detach-cluster on every other node before starting them:

# On each remaining node:
bin/elasticsearch-node detach-cluster

For planned removals where the cluster is still healthy, use the voting configuration exclusions API. This tells Elasticsearch to reconfigure the voting set before the node departs:

POST /_cluster/voting_config_exclusions?node_names=master-3

The cluster adjusts its quorum requirements automatically. Wait for confirmation before shutting down the excluded node.

Dedicated Master Nodes and Common Failure Modes

Run three dedicated master-eligible nodes in production. Not one, not two - three. Dedicated means node.roles: [master] with no data, ingest, or coordinating responsibilities. When master nodes share a box with data duties, a heavy merge or a large aggregation can trigger GC pauses long enough for the master to miss its fault detection deadline. The rest of the cluster concludes the master is dead and triggers a new election, even though the node was just paused.

The most frequent failure modes: network partitions that isolate one master-eligible node from the other two (the isolated node cannot win, the remaining two still have quorum - this is the safe outcome); long GC pauses caused by heap pressure from co-located data workloads; and disk full on the master's data path, which prevents persisting cluster state updates. Keep master nodes on fast local storage with free space, cap heap at 4-8 GB, and use G1GC. Monitor jvm.gc.collectors.old.collection_time_in_millis - if old-gen collection time spikes, your master is one long pause away from an unnecessary election.