Elasticsearch cluster.routing.allocation.node_concurrent_recoveries Setting

Pulse - Elasticsearch Operations Done Right

On this page

Example Common Issues and Misuses Do's and Don'ts Frequently Asked Questions

The cluster.routing.allocation.node_concurrent_recoveries setting in Elasticsearch controls the number of concurrent shard recoveries allowed on a single node. This setting plays a crucial role in managing the cluster's recovery process and overall performance during shard allocation and rebalancing.

  • Default Value: 2
  • Possible Values: Any positive integer
  • Recommendation: The default value is suitable for most scenarios, but it can be adjusted based on the cluster's hardware capabilities and recovery requirements.

This setting limits the number of concurrent shard recoveries that can happen on a single node. It applies to both primary and replica shards. Increasing this value can speed up the recovery process but may also increase the load on the node and network.

Example

To change the cluster.routing.allocation.node_concurrent_recoveries setting using the cluster settings API:

PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.node_concurrent_recoveries": 4
  }
}

This change would allow up to 4 concurrent shard recoveries per node. You might want to increase this value if you have powerful nodes with ample resources and need to speed up the recovery process. However, be cautious as it can lead to increased CPU and network usage.

Common Issues and Misuses

  • Setting the value too high can overwhelm node resources, leading to performance degradation.
  • Setting the value too low can significantly slow down the cluster recovery process, especially in large clusters with many shards.

Do's and Don'ts

  • Do monitor your cluster's performance after changing this setting.
  • Do consider your hardware capabilities when adjusting this value.
  • Don't set this value excessively high without understanding the potential impact on your cluster's stability.
  • Don't ignore this setting when planning for disaster recovery scenarios.

Frequently Asked Questions

Q: How does this setting differ from cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries?
A: While cluster.routing.allocation.node_concurrent_recoveries sets an overall limit for recoveries per node, the other two settings specifically control incoming and outgoing recoveries respectively. If set, the more specific settings take precedence.

Q: Can changing this setting impact the cluster's stability?
A: Yes, setting this value too high can overload nodes and network resources, potentially leading to stability issues. It's important to adjust gradually and monitor the cluster's performance.

Q: How does this setting affect snapshot restore operations?
A: This setting also applies to shard recoveries during snapshot restore operations. A higher value can speed up the restore process but may also increase the load on the cluster.

Q: Should I adjust this setting in a production environment?
A: While it's possible to adjust this setting in production, it's recommended to test changes in a staging environment first. Any adjustments in production should be made carefully and incrementally.

Q: How can I monitor the impact of changing this setting?
A: You can monitor the cluster's recovery process using Elasticsearch's Cat Recovery API (GET _cat/recovery) and watch for any changes in CPU usage, network traffic, and overall cluster performance after adjusting the setting.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.