Elasticsearch gateway.expected_master_nodes Setting

The gateway.expected_master_nodes setting in Elasticsearch controls the number of master-eligible nodes that should be present before the cluster starts the recovery process after a full cluster restart. This setting helps prevent data loss and split-brain scenarios by ensuring that a sufficient number of master-eligible nodes are available before allowing the cluster to become operational.

Description

  • Default value: 0
  • Possible values: Any non-negative integer
  • Recommendation: Set this to (master_eligible_nodes / 2) + 1

The gateway.expected_master_nodes setting is crucial for maintaining cluster integrity during restarts. When set to a value greater than 0, Elasticsearch will wait for the specified number of master-eligible nodes to join the cluster before initiating the recovery process. This helps prevent scenarios where a partial cluster restart could lead to data loss or split-brain situations.

It's important to note that this setting should be used in conjunction with gateway.recover_after_nodes and gateway.recover_after_time for optimal cluster recovery behavior.

Example

To set the gateway.expected_master_nodes value using the cluster settings API:

PUT _cluster/settings
{
  "persistent": {
    "gateway.expected_master_nodes": 2
  }
}

In this example, we set the value to 2, which means the cluster will wait for at least 2 master-eligible nodes to be present before starting the recovery process. This configuration is suitable for a cluster with 3 master-eligible nodes, as it ensures that a majority of master nodes are available.

Common Issues and Misuses

  1. Setting the value too low: This can lead to premature cluster recovery and potential data loss.
  2. Setting the value too high: This can prevent the cluster from recovering if the expected number of master nodes is not available.
  3. Not adjusting the value when scaling the cluster: As the number of master-eligible nodes changes, this setting should be updated accordingly.

Do's and Don'ts

Do's:

  • Set this value to at least (master_eligible_nodes / 2) + 1
  • Update this setting when changing the number of master-eligible nodes in your cluster
  • Use this setting in combination with gateway.recover_after_nodes and gateway.recover_after_time

Don'ts:

  • Don't set this value to 0 in production environments
  • Don't set this value higher than the total number of master-eligible nodes in your cluster
  • Don't ignore this setting when planning for disaster recovery scenarios

Frequently Asked Questions

Q: How does gateway.expected_master_nodes differ from discovery.zen.minimum_master_nodes?
A: While both settings relate to master nodes, they serve different purposes. gateway.expected_master_nodes is used during cluster restart to ensure enough master nodes are present before recovery, while discovery.zen.minimum_master_nodes (deprecated in newer versions) was used to prevent split-brain scenarios during normal operation.

Q: Can I change gateway.expected_master_nodes dynamically?
A: Yes, you can change this setting dynamically using the cluster settings API. However, the new value will only take effect during the next full cluster restart.

Q: What happens if the number of available master nodes is less than gateway.expected_master_nodes?
A: The cluster will wait and not start the recovery process until either the expected number of master nodes joins or the gateway.recover_after_time is reached.

Q: Should I set gateway.expected_master_nodes in a single-node cluster?
A: For single-node clusters, it's generally safe to leave this setting at its default value of 0. However, if you plan to scale to multiple nodes in the future, consider setting it to 1.

Q: How does this setting interact with gateway.recover_after_nodes?
A: gateway.recover_after_nodes specifies the minimum number of nodes (not just master-eligible) required to start recovery. gateway.expected_master_nodes adds an additional check specifically for master-eligible nodes. Both conditions must be met (along with gateway.recover_after_time) for the cluster to start recovery.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.