Elasticsearch cluster.routing.allocation.node_concurrent_recoveries: Default 2, Tuning Guide

The cluster.routing.allocation.node_concurrent_recoveries setting controls how many peer-recovery operations can run on a single Elasticsearch data node at the same time. It defaults to 2 and acts as a fallback for both directions of recovery (a node sending shards out and receiving shards in). The more specific `cluster.routing.allocation.node_concurrent_incoming_recoveries` and cluster.routing.allocation.node_concurrent_outgoing_recoveries settings, if set, override this value for their respective directions.

Definition

cluster.routing.allocation.node_concurrent_recoveries is a dynamic cluster-level setting that caps concurrent shard recoveries per node. Peer recovery is the process of rebuilding a replica shard on a node from a healthy primary or replica on another node, or restoring a shard during rebalance. The setting bounds the parallelism of these operations to avoid saturating disk, network, or CPU.

Default and Allowed Values

Property Value
Default 2
Type Positive integer, dynamic
Scope Cluster, applied per node
Direction Both incoming and outgoing, unless overridden
Overrides node_concurrent_incoming_recoveries and node_concurrent_outgoing_recoveries take precedence when set

A separate setting, cluster.routing.allocation.node_initial_primaries_recoveries, defaults to 4 and controls the parallelism of primary recoveries when a node first starts (recovery from disk, not from peers).

How to Change It

Through the cluster settings API:

# Allow up to 4 concurrent peer recoveries per node
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.node_concurrent_recoveries": 4
  }
}

Reset to default with null. Inspect with:

GET /_cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.node_concurrent_*

The setting works with the indices.recovery.* throttles, in particular indices.recovery.max_bytes_per_sec (default 40 MB/s per node since 7.0, with auto-tuning in 8.x for some node sizes). Raising concurrent recoveries without also raising the per-recovery bandwidth ceiling just spreads the same total bandwidth thinner.

When to Tune It

The default 2 was chosen as a safe value for spinning disks and modest network. Modern clusters often have NVMe storage and 10/25 Gbps network, where 2 is too conservative and node-join times become dominated by serial recovery. Consider raising to 4-8 when:

  • Data nodes run on NVMe with >1 GB/s sequential throughput.
  • Network bandwidth comfortably exceeds 1 GB/s.
  • You routinely lose and re-add nodes (autoscaling, spot instance churn).
  • Cluster recovery time is a binding SLO.

Keep at 2 (or lower) when:

  • Spinning disks or constrained network.
  • Recovery competes with peak indexing or search load.
  • You see relocation operations stall or get throttled.

Operational Impact

Each concurrent recovery consumes:

  • Disk I/O on both source and destination nodes.
  • Network bandwidth, capped per recovery by indices.recovery.max_bytes_per_sec.
  • CPU for translog replay on the destination.
  • A small amount of JVM heap for transport buffers.

Setting the value too high overloads the node's I/O subsystem and can starve query traffic. Setting it too low extends node-join time, which extends the window during which the cluster runs without full redundancy.

Monitor recovery in flight with:

GET /_cat/recovery?v&active_only=true

The output shows the source node, target node, bytes transferred, and percentage complete for each active recovery.

Common Mistakes

  1. Treating the setting as a recovery speed dial. Recovery bandwidth is bounded by indices.recovery.max_bytes_per_sec. More concurrent recoveries do not exceed that ceiling.
  2. Setting incoming and outgoing limits, then forgetting about the base setting. The directional overrides take precedence, but the base setting is still inherited when the directional ones are absent.
  3. Raising the limit on a hot cluster. Recovery is a background workload; in a saturated cluster it competes with the foreground. Tune during a quiet window.
  4. Ignoring node_initial_primaries_recoveries. That separate setting controls the parallelism of primary recovery from local disk at node startup, which is what dominates restart time after a clean stop.

Prevent Concurrent-Recovery Misconfiguration with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that tracks cluster.routing.allocation.node_concurrent_recoveries (default 2) together with node_concurrent_incoming_recoveries, node_concurrent_outgoing_recoveries, node_initial_primaries_recoveries (default 4), and indices.recovery.max_bytes_per_sec, flagging:

  • Drift between intended and actual values across nodes
  • Settings that are unsafe for your workload (e.g. concurrency raised to 8 without raising max_bytes_per_sec, so total bandwidth never changes; concurrency raised on spinning-disk nodes where 2 already saturates I/O; recoveries running during peak search hours)
  • The downstream operational impact: time-to-green after node loss, recovery throughput per node, and the recovery vs search latency tradeoff

When a planned recovery is slower than expected or competing with foreground traffic, Pulse names the binding constraint - concurrency cap, per-recovery bandwidth, disk IOPS, or network - so the right knob gets turned.

Connect your cluster.

Frequently Asked Questions

Q: What is the default for cluster.routing.allocation.node_concurrent_recoveries?
A: The default is 2. This caps both incoming and outgoing peer recoveries per node unless node_concurrent_incoming_recoveries or node_concurrent_outgoing_recoveries is set, in which case the directional values take precedence.

Q: Can I change cluster.routing.allocation.node_concurrent_recoveries without a restart?
A: Yes. It is a dynamic cluster setting. Use PUT /_cluster/settings and the change applies at the next allocator pass. Recoveries already in flight are not interrupted.

Q: Will raising this setting speed up cluster recovery?
A: It can, up to the point where the per-recovery bandwidth limit (indices.recovery.max_bytes_per_sec) or the disk and network become the bottleneck. On NVMe-backed nodes with fast networking, raising from 2 to 4-8 typically helps. On spinning disks it usually does not.

Q: How is this different from node_concurrent_incoming_recoveries?
A: node_concurrent_recoveries is the fallback for both directions. node_concurrent_incoming_recoveries is specific to recoveries the node is receiving (where the node is the destination). If the directional setting is set, it overrides the base value for that direction.

Q: Does this setting affect snapshot restore?
A: Yes. Snapshot restore performs shard recovery from the repository, and the same per-node concurrency limit applies. Restore speed is also bounded by snapshot repository throughput.

Q: What metrics show recovery in progress?
A: GET /_cat/recovery?v&active_only=true lists active recoveries with source, target, and percent complete. GET /_recovery returns the same data in JSON. The indices.recovery.* stats in the node stats API show byte rates and throttle counts.

Q: What's the best tool to tune concurrent recoveries without crowding out live traffic?
A: Pulse is built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks node_concurrent_recoveries against actual disk and network utilization, correlates recovery throughput with search latency p95, and recommends concurrency changes that are safe for the cluster's current hardware and load profile.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.