Elasticsearch gateway.expected_data_nodes Setting

The gateway.expected_data_nodes setting in Elasticsearch controls the number of data nodes that should be present in the cluster before starting the recovery process after a full cluster restart.

Description

  • Default value: 0
  • Possible values: Any non-negative integer
  • Recommendation: Set this to the number of data nodes in your cluster

This setting is part of the gateway module, which handles cluster state persistence across full cluster restarts. When set to a value greater than zero, Elasticsearch will wait for the specified number of data nodes to join the cluster before starting the recovery process. This helps ensure that all expected data is available before the cluster becomes operational.

Version Information

This setting has been available since early versions of Elasticsearch and continues to be supported in current versions.

Example

To set the gateway.expected_data_nodes value using the cluster settings API:

PUT /_cluster/settings
{
  "persistent": {
    "gateway.expected_data_nodes": 5
  }
}

In this example, we set the expected number of data nodes to 5. This would be appropriate for a cluster that normally operates with 5 data nodes. The reason for changing this setting might be to ensure that all data nodes are present before recovery begins, which can help prevent incomplete recovery scenarios.

Common Issues and Misuses

  • Setting the value too high can prevent the cluster from recovering if the expected number of nodes is not reached.
  • Setting the value too low (or leaving it at 0) may cause the cluster to start recovery before all data nodes have joined, potentially leading to incomplete recovery.

Do's and Don'ts

Do:

  • Set this value to match the number of data nodes in your production cluster.
  • Adjust this setting when scaling your cluster up or down.
  • Use in conjunction with gateway.recover_after_nodes for more granular control over recovery.

Don't:

  • Set this value higher than the actual number of data nodes in your cluster.
  • Ignore this setting when planning for disaster recovery scenarios.
  • Confuse this with discovery.zen.minimum_master_nodes (for older versions) or discovery.seed_hosts (for newer versions).

Frequently Asked Questions

Q: How does gateway.expected_data_nodes differ from discovery.zen.minimum_master_nodes?
A: gateway.expected_data_nodes is used for cluster recovery after a full restart and applies to data nodes, while discovery.zen.minimum_master_nodes (in older versions) was used for cluster formation and applied to master-eligible nodes.

Q: What happens if the expected number of data nodes is not reached?
A: The cluster will wait indefinitely for the specified number of data nodes to join before starting the recovery process.

Q: Can I change gateway.expected_data_nodes while the cluster is running?
A: Yes, you can change this setting dynamically using the cluster settings API. However, the new value will only take effect during the next full cluster restart.

Q: Should I set gateway.expected_data_nodes in a single-node development environment?
A: For a single-node development environment, you can leave this setting at its default value of 0 or set it to 1.

Q: How does this setting interact with gateway.recover_after_nodes?
A: gateway.recover_after_nodes specifies the minimum number of nodes required to start recovery, while gateway.expected_data_nodes sets the number of data nodes to wait for before starting recovery. Using both allows for more fine-grained control over the recovery process.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.