The gateway.recover_after_data_nodes
setting in Elasticsearch controls the minimum number of data nodes that must be present in the cluster before the recovery process can start after a full cluster restart.
Description
- Default value: 0
- Possible values: Any non-negative integer
- Recommendation: Set this to a value that represents a significant portion of your expected data nodes, typically 50-70% of your total data nodes.
This setting is part of the gateway recovery process, which is crucial for maintaining cluster integrity after a full cluster restart. It ensures that enough data nodes are available before the cluster starts recovering its state.
Example
To change the gateway.recover_after_data_nodes
setting using the cluster settings API:
PUT _cluster/settings
{
"persistent": {
"gateway.recover_after_data_nodes": 3
}
}
In this example, we set the value to 3, meaning the cluster will wait for at least 3 data nodes to be present before starting the recovery process. This can be useful in a cluster with 5 data nodes, ensuring that a majority of nodes are available before recovery begins.
Common Issues and Misuses
- Setting the value too low may lead to incomplete recovery if not enough data is available.
- Setting the value too high might delay cluster recovery unnecessarily if some nodes are slow to start or have issues.
Do's and Don'ts
- Do consider your cluster size and topology when setting this value.
- Do use this setting in conjunction with
gateway.recover_after_nodes
andgateway.expected_data_nodes
for more granular control. - Don't set this value higher than your total number of data nodes.
- Don't change this setting frequently; it's primarily for initial cluster setup or major reconfiguration.
Frequently Asked Questions
Q: How does gateway.recover_after_data_nodes differ from gateway.recover_after_nodes?
A: While gateway.recover_after_nodes
considers all node types, gateway.recover_after_data_nodes
specifically counts only data nodes. This allows for more precise control in clusters with dedicated master or client nodes.
Q: Can changing this setting impact an already running cluster?
A: This setting primarily affects the cluster during a full restart. Changing it on a running cluster will not have an immediate effect but will be applied during the next full cluster restart.
Q: What happens if the number of available data nodes never reaches the set value?
A: The cluster will not start the recovery process until the condition is met or until the gateway.recover_after_time
setting's timeout is reached, whichever comes first.
Q: Is it safe to set this value to 0?
A: Setting it to 0 (the default) means the cluster will start recovery as soon as any data node joins. While safe, it may not be optimal for larger clusters where you want to ensure a significant portion of data is available before recovery.
Q: How does this setting interact with shard allocation?
A: This setting doesn't directly affect shard allocation. It determines when the cluster starts the recovery process. Once recovery starts, shard allocation follows its own rules and settings.