Elasticsearch gateway.expected_data

The gateway.expected_data_nodes setting in Elasticsearch controls the number of data nodes that should be present in the cluster before starting the recovery process after a full cluster restart.

Description

Default value: 0
Possible values: Any non-negative integer
Recommendation: Set this to the number of data nodes in your cluster

This setting is part of the gateway module, which handles cluster state persistence across full cluster restarts. When set to a value greater than zero, Elasticsearch will wait for the specified number of data nodes to join the cluster before starting the recovery process. This helps ensure that all necessary data is available for a complete recovery.

Version Information

This setting has been available since early versions of Elasticsearch and continues to be supported in current versions.

Example

To set the gateway.expected_data_nodes value using the cluster settings API:

PUT /_cluster/settings
{
  "persistent": {
    "gateway.expected_data_nodes": 5
  }
}

In this example, we set the expected number of data nodes to 5. This might be done in a cluster with 5 dedicated data nodes to ensure all nodes are present before recovery begins after a full cluster restart.

Common Issues and Misuses

Setting the value too high can prevent the cluster from recovering if the expected number of nodes never joins.
Setting the value too low may cause the cluster to start recovery prematurely, potentially leading to incomplete data availability.

Do's and Don'ts

Do:

Set this value to match the number of data nodes in your cluster.
Adjust this setting when scaling your cluster up or down.
Use in conjunction with `gateway.recover_after_nodes` for more granular control.

Don't:

Set this value higher than the actual number of data nodes in your cluster.
Ignore this setting when planning for disaster recovery scenarios.
Confuse this with discovery.zen.minimum_master_nodes (for older versions) or discovery.seed_hosts (for newer versions).

Frequently Asked Questions

Q: How does gateway.expected_data_nodes differ from discovery.zen.minimum_master_nodes?
A: While gateway.expected_data_nodes is used for cluster recovery after a full restart, discovery.zen.minimum_master_nodes (in older versions) is used to prevent split-brain scenarios during normal operation.

Q: Can setting gateway.expected_data_nodes to 0 cause any issues?
A: Setting it to 0 (the default) means the cluster will start recovery as soon as the first node joins, which may lead to incomplete data if not all nodes are available.

Q: How does this setting interact with gateway.recover_after_nodes?
A: gateway.recover_after_nodes specifies the minimum number of nodes required to start recovery, while gateway.expected_data_nodes sets the expected number of data nodes. Using both allows for more fine-grained control over the recovery process.

Q: Should I change this setting during a rolling restart?
A: No, this setting is primarily for full cluster restarts. Changing it during a rolling restart is unnecessary and may cause confusion.

Q: How does this setting affect cluster formation in containerized environments?
A: In containerized environments where nodes may start in an unpredictable order, setting an appropriate value can help ensure all necessary data nodes are present before recovery begins, improving data consistency.