The gateway.expected_data_nodes
setting in Elasticsearch controls the number of data nodes that should be present in the cluster before starting the recovery process after a full cluster restart.
Description
- Default value: 0
- Possible values: Any non-negative integer
- Recommendation: Set this to the number of data nodes in your cluster
This setting is part of the gateway module, which handles cluster state persistence across full cluster restarts. When set to a value greater than zero, Elasticsearch will wait for the specified number of data nodes to join the cluster before starting the recovery process. This helps ensure that all expected data is available before the cluster becomes operational.
Version Information
This setting has been available since early versions of Elasticsearch and continues to be supported in current versions.
Example
To set the gateway.expected_data_nodes
value using the cluster settings API:
PUT /_cluster/settings
{
"persistent": {
"gateway.expected_data_nodes": 5
}
}
In this example, we set the expected number of data nodes to 5. This would be appropriate for a cluster that normally operates with 5 data nodes. The reason for changing this setting might be to ensure that all data nodes are present before recovery begins, which can help prevent incomplete recovery scenarios.
Common Issues and Misuses
- Setting the value too high can prevent the cluster from recovering if the expected number of nodes is not reached.
- Setting the value too low (or leaving it at 0) may cause the cluster to start recovery before all data nodes have joined, potentially leading to incomplete recovery.
Do's and Don'ts
Do:
- Set this value to match the number of data nodes in your production cluster.
- Adjust this setting when scaling your cluster up or down.
- Use in conjunction with
gateway.recover_after_nodes
for more granular control over recovery.
Don't:
- Set this value higher than the actual number of data nodes in your cluster.
- Ignore this setting when planning for disaster recovery scenarios.
- Confuse this with
discovery.zen.minimum_master_nodes
(for older versions) ordiscovery.seed_hosts
(for newer versions).
Frequently Asked Questions
Q: How does gateway.expected_data_nodes differ from discovery.zen.minimum_master_nodes?
A: gateway.expected_data_nodes
is used for cluster recovery after a full restart and applies to data nodes, while discovery.zen.minimum_master_nodes
(in older versions) was used for cluster formation and applied to master-eligible nodes.
Q: What happens if the expected number of data nodes is not reached?
A: The cluster will wait indefinitely for the specified number of data nodes to join before starting the recovery process.
Q: Can I change gateway.expected_data_nodes while the cluster is running?
A: Yes, you can change this setting dynamically using the cluster settings API. However, the new value will only take effect during the next full cluster restart.
Q: Should I set gateway.expected_data_nodes in a single-node development environment?
A: For a single-node development environment, you can leave this setting at its default value of 0 or set it to 1.
Q: How does this setting interact with gateway.recover_after_nodes?
A: gateway.recover_after_nodes
specifies the minimum number of nodes required to start recovery, while gateway.expected_data_nodes
sets the number of data nodes to wait for before starting recovery. Using both allows for more fine-grained control over the recovery process.