Elasticsearch gateway.recover_after_data_nodes Setting

The gateway.recover_after_data_nodes setting in Elasticsearch controls the minimum number of data nodes that must be present in the cluster before the recovery process can start after a full cluster restart.

Description

  • Default value: 0
  • Possible values: Any non-negative integer
  • Recommendation: Set this to a value that represents a significant portion of your expected data nodes, typically 50-70% of your total data nodes.

This setting is part of the gateway recovery process, which is crucial for maintaining cluster integrity after a full cluster restart. It ensures that enough data nodes are available before the cluster starts recovering its state.

Example

To change the gateway.recover_after_data_nodes setting using the cluster settings API:

PUT _cluster/settings
{
  "persistent": {
    "gateway.recover_after_data_nodes": 3
  }
}

In this example, we set the value to 3, meaning the cluster will wait for at least 3 data nodes to be present before starting the recovery process. This can be useful in a cluster with 5 data nodes, ensuring that a majority of nodes are available before recovery begins.

Common Issues and Misuses

  • Setting the value too low may lead to incomplete recovery if not enough data is available.
  • Setting the value too high might delay cluster recovery unnecessarily if some nodes are slow to start or have issues.

Do's and Don'ts

  • Do consider your cluster size and topology when setting this value.
  • Do use this setting in conjunction with gateway.recover_after_nodes and gateway.expected_data_nodes for more granular control.
  • Don't set this value higher than your total number of data nodes.
  • Don't change this setting frequently; it's primarily for initial cluster setup or major reconfiguration.

Frequently Asked Questions

Q: How does gateway.recover_after_data_nodes differ from gateway.recover_after_nodes?
A: While gateway.recover_after_nodes considers all node types, gateway.recover_after_data_nodes specifically counts only data nodes. This allows for more precise control in clusters with dedicated master or client nodes.

Q: Can changing this setting impact an already running cluster?
A: This setting primarily affects the cluster during a full restart. Changing it on a running cluster will not have an immediate effect but will be applied during the next full cluster restart.

Q: What happens if the number of available data nodes never reaches the set value?
A: The cluster will not start the recovery process until the condition is met or until the gateway.recover_after_time setting's timeout is reached, whichever comes first.

Q: Is it safe to set this value to 0?
A: Setting it to 0 (the default) means the cluster will start recovery as soon as any data node joins. While safe, it may not be optimal for larger clusters where you want to ensure a significant portion of data is available before recovery.

Q: How does this setting interact with shard allocation?
A: This setting doesn't directly affect shard allocation. It determines when the cluster starts the recovery process. Once recovery starts, shard allocation follows its own rules and settings.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.