The gateway.expected_data_nodes
setting in Elasticsearch controls the number of data nodes that should be present in the cluster before starting the recovery process after a full cluster restart.
Description
- Default value: 0
- Possible values: Any non-negative integer
- Recommendation: Set this to the number of data nodes in your cluster
This setting is part of the gateway module, which handles cluster state persistence across full cluster restarts. When set to a value greater than zero, Elasticsearch will wait for the specified number of data nodes to join the cluster before starting the recovery process. This helps ensure that all necessary data is available for a complete recovery.
Version Information
This setting has been available since early versions of Elasticsearch and continues to be supported in current versions.
Example
To set the gateway.expected_data_nodes
value using the cluster settings API:
PUT /_cluster/settings
{
"persistent": {
"gateway.expected_data_nodes": 5
}
}
In this example, we set the expected number of data nodes to 5. This might be done in a cluster with 5 dedicated data nodes to ensure all nodes are present before recovery begins after a full cluster restart.
Common Issues and Misuses
- Setting the value too high can prevent the cluster from recovering if the expected number of nodes never joins.
- Setting the value too low may cause the cluster to start recovery prematurely, potentially leading to incomplete data availability.
Do's and Don'ts
Do:
- Set this value to match the number of data nodes in your cluster.
- Adjust this setting when scaling your cluster up or down.
- Use in conjunction with
gateway.recover_after_nodes
for more granular control.
Don't:
- Set this value higher than the actual number of data nodes in your cluster.
- Ignore this setting when planning for disaster recovery scenarios.
- Confuse this with
discovery.zen.minimum_master_nodes
(for older versions) ordiscovery.seed_hosts
(for newer versions).
Frequently Asked Questions
Q: How does gateway.expected_data_nodes differ from discovery.zen.minimum_master_nodes?
A: While gateway.expected_data_nodes
is used for cluster recovery after a full restart, discovery.zen.minimum_master_nodes
(in older versions) is used to prevent split-brain scenarios during normal operation.
Q: Can setting gateway.expected_data_nodes to 0 cause any issues?
A: Setting it to 0 (the default) means the cluster will start recovery as soon as the first node joins, which may lead to incomplete data if not all nodes are available.
Q: How does this setting interact with gateway.recover_after_nodes?
A: gateway.recover_after_nodes
specifies the minimum number of nodes required to start recovery, while gateway.expected_data_nodes
sets the expected number of data nodes. Using both allows for more fine-grained control over the recovery process.
Q: Should I change this setting during a rolling restart?
A: No, this setting is primarily for full cluster restarts. Changing it during a rolling restart is unnecessary and may cause confusion.
Q: How does this setting affect cluster formation in containerized environments?
A: In containerized environments where nodes may start in an unpredictable order, setting an appropriate value can help ensure all necessary data nodes are present before recovery begins, improving data consistency.