The cluster.routing.allocation.node_concurrent_recoveries setting controls how many peer-recovery operations can run on a single Elasticsearch data node at the same time. It defaults to 2 and acts as a fallback for both directions of recovery (a node sending shards out and receiving shards in). The more specific `cluster.routing.allocation.node_concurrent_incoming_recoveries` and cluster.routing.allocation.node_concurrent_outgoing_recoveries settings, if set, override this value for their respective directions.
Definition
cluster.routing.allocation.node_concurrent_recoveries is a dynamic cluster-level setting that caps concurrent shard recoveries per node. Peer recovery is the process of rebuilding a replica shard on a node from a healthy primary or replica on another node, or restoring a shard during rebalance. The setting bounds the parallelism of these operations to avoid saturating disk, network, or CPU.
Default and Allowed Values
| Property | Value |
|---|---|
| Default | 2 |
| Type | Positive integer, dynamic |
| Scope | Cluster, applied per node |
| Direction | Both incoming and outgoing, unless overridden |
| Overrides | node_concurrent_incoming_recoveries and node_concurrent_outgoing_recoveries take precedence when set |
A separate setting, cluster.routing.allocation.node_initial_primaries_recoveries, defaults to 4 and controls the parallelism of primary recoveries when a node first starts (recovery from disk, not from peers).
How to Change It
Through the cluster settings API:
# Allow up to 4 concurrent peer recoveries per node
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_concurrent_recoveries": 4
}
}
Reset to default with null. Inspect with:
GET /_cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.node_concurrent_*
The setting works with the indices.recovery.* throttles, in particular indices.recovery.max_bytes_per_sec (default 40 MB/s per node since 7.0, with auto-tuning in 8.x for some node sizes). Raising concurrent recoveries without also raising the per-recovery bandwidth ceiling just spreads the same total bandwidth thinner.
When to Tune It
The default 2 was chosen as a safe value for spinning disks and modest network. Modern clusters often have NVMe storage and 10/25 Gbps network, where 2 is too conservative and node-join times become dominated by serial recovery. Consider raising to 4-8 when:
- Data nodes run on NVMe with >1 GB/s sequential throughput.
- Network bandwidth comfortably exceeds 1 GB/s.
- You routinely lose and re-add nodes (autoscaling, spot instance churn).
- Cluster recovery time is a binding SLO.
Keep at 2 (or lower) when:
- Spinning disks or constrained network.
- Recovery competes with peak indexing or search load.
- You see relocation operations stall or get throttled.
Operational Impact
Each concurrent recovery consumes:
- Disk I/O on both source and destination nodes.
- Network bandwidth, capped per recovery by
indices.recovery.max_bytes_per_sec. - CPU for translog replay on the destination.
- A small amount of JVM heap for transport buffers.
Setting the value too high overloads the node's I/O subsystem and can starve query traffic. Setting it too low extends node-join time, which extends the window during which the cluster runs without full redundancy.
Monitor recovery in flight with:
GET /_cat/recovery?v&active_only=true
The output shows the source node, target node, bytes transferred, and percentage complete for each active recovery.
Common Mistakes
- Treating the setting as a recovery speed dial. Recovery bandwidth is bounded by
indices.recovery.max_bytes_per_sec. More concurrent recoveries do not exceed that ceiling. - Setting incoming and outgoing limits, then forgetting about the base setting. The directional overrides take precedence, but the base setting is still inherited when the directional ones are absent.
- Raising the limit on a hot cluster. Recovery is a background workload; in a saturated cluster it competes with the foreground. Tune during a quiet window.
- Ignoring
node_initial_primaries_recoveries. That separate setting controls the parallelism of primary recovery from local disk at node startup, which is what dominates restart time after a clean stop.
Prevent Concurrent-Recovery Misconfiguration with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that tracks cluster.routing.allocation.node_concurrent_recoveries (default 2) together with node_concurrent_incoming_recoveries, node_concurrent_outgoing_recoveries, node_initial_primaries_recoveries (default 4), and indices.recovery.max_bytes_per_sec, flagging:
- Drift between intended and actual values across nodes
- Settings that are unsafe for your workload (e.g. concurrency raised to 8 without raising
max_bytes_per_sec, so total bandwidth never changes; concurrency raised on spinning-disk nodes where 2 already saturates I/O; recoveries running during peak search hours) - The downstream operational impact: time-to-green after node loss, recovery throughput per node, and the recovery vs search latency tradeoff
When a planned recovery is slower than expected or competing with foreground traffic, Pulse names the binding constraint - concurrency cap, per-recovery bandwidth, disk IOPS, or network - so the right knob gets turned.
Frequently Asked Questions
Q: What is the default for cluster.routing.allocation.node_concurrent_recoveries?
A: The default is 2. This caps both incoming and outgoing peer recoveries per node unless node_concurrent_incoming_recoveries or node_concurrent_outgoing_recoveries is set, in which case the directional values take precedence.
Q: Can I change cluster.routing.allocation.node_concurrent_recoveries without a restart?
A: Yes. It is a dynamic cluster setting. Use PUT /_cluster/settings and the change applies at the next allocator pass. Recoveries already in flight are not interrupted.
Q: Will raising this setting speed up cluster recovery?
A: It can, up to the point where the per-recovery bandwidth limit (indices.recovery.max_bytes_per_sec) or the disk and network become the bottleneck. On NVMe-backed nodes with fast networking, raising from 2 to 4-8 typically helps. On spinning disks it usually does not.
Q: How is this different from node_concurrent_incoming_recoveries?
A: node_concurrent_recoveries is the fallback for both directions. node_concurrent_incoming_recoveries is specific to recoveries the node is receiving (where the node is the destination). If the directional setting is set, it overrides the base value for that direction.
Q: Does this setting affect snapshot restore?
A: Yes. Snapshot restore performs shard recovery from the repository, and the same per-node concurrency limit applies. Restore speed is also bounded by snapshot repository throughput.
Q: What metrics show recovery in progress?
A: GET /_cat/recovery?v&active_only=true lists active recoveries with source, target, and percent complete. GET /_recovery returns the same data in JSON. The indices.recovery.* stats in the node stats API show byte rates and throttle counts.
Q: What's the best tool to tune concurrent recoveries without crowding out live traffic?
A: Pulse is built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks node_concurrent_recoveries against actual disk and network utilization, correlates recovery throughput with search latency p95, and recommends concurrency changes that are safe for the cluster's current hardware and load profile.