The cluster.routing.allocation.node_concurrent_incoming_recoveries setting limits the number of concurrent shard recoveries a single Elasticsearch data node can receive at one time. It defaults to 2 and, when set, overrides the more general `cluster.routing.allocation.node_concurrent_recoveries` for inbound (destination-side) recoveries only. Tune this when you are scaling out (new nodes need to pull shards in fast) or recovering from a node loss, and the node has the disk and network headroom to absorb more parallel work.
Definition
cluster.routing.allocation.node_concurrent_incoming_recoveries is a dynamic cluster-level setting that caps inbound peer recoveries per node. Inbound here means the node is the destination: a replica is being built on this node from a primary elsewhere, or a shard is being moved here as part of rebalance. The complementary setting, cluster.routing.allocation.node_concurrent_outgoing_recoveries, controls the source side.
Default and Allowed Values
| Property | Value |
|---|---|
| Default | 2 |
| Type | Positive integer, dynamic cluster setting |
| Scope | Cluster, applied per node |
| Direction | Inbound only (this node as destination) |
| Override of | cluster.routing.allocation.node_concurrent_recoveries |
When unset, Elasticsearch uses the base node_concurrent_recoveries value (also default 2) for the inbound direction.
How to Change It
# Allow 4 concurrent inbound recoveries per node
PUT /_cluster/settings
{
"persistent": {
"cluster.routing.allocation.node_concurrent_incoming_recoveries": 4
}
}
Reset to default with null. Inspect with:
GET /_cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.node_concurrent_*
If you raise this, consider also raising node_concurrent_outgoing_recoveries so source nodes are not the bottleneck, and the cluster-wide cluster.routing.allocation.cluster_concurrent_rebalance if you need more parallelism for rebalance specifically.
When to Tune It
Raise the inbound limit when:
- Adding new data nodes to an existing cluster. The new nodes start empty and need to pull many shards in.
- Recovering after a node restart where many replicas need to be rebuilt on the returning node.
- The cluster has NVMe storage and fast network and recovery is currently disk- or network-idle but slow.
Keep at the default 2 when:
- The cluster is steady-state and recovery is a rare event.
- Spinning disks or constrained network where 2 already saturates I/O.
- Inbound nodes are also serving heavy search or indexing load.
The setting is purely a parallelism cap. Per-recovery bandwidth is governed separately by indices.recovery.max_bytes_per_sec (default 40 MB/s; in 8.x there is auto-tuning based on hardware class for certain managed services).
Operational Impact
Inbound recovery on a node consumes:
- Write disk bandwidth on the destination.
- Receive network bandwidth.
- A small amount of heap for transport buffers and translog replay.
Each concurrent recovery is bounded by indices.recovery.max_bytes_per_sec, so raising concurrency from 2 to 4 does not double bandwidth - it just lets four shards progress at half the throughput each, which is usually faster overall because shards finish at different points and the work pipelines.
Monitor with:
GET /_cat/recovery?v&active_only=true
The target_node column shows the destination; sort to see how many inbound recoveries any one node is running.
Common Mistakes
- Raising only the inbound limit. The outbound side (
node_concurrent_outgoing_recoveries) is the bottleneck for source nodes. Raise both together when scaling out aggressively. - Setting incoming and outgoing without checking the base value. Directional values override the base, but if you later remove the directional value, the base default of 2 kicks back in. Be explicit.
- Forgetting
indices.recovery.max_bytes_per_sec. Concurrency without bandwidth headroom does nothing. - Tuning during incident response. Test in staging first; sudden recovery storms can compound an outage.
Prevent Inbound-Recovery Misconfiguration with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that tracks cluster.routing.allocation.node_concurrent_incoming_recoveries (default 2) and the outbound counterpart together with indices.recovery.max_bytes_per_sec, flagging:
- Drift between intended and actual values, especially after directional overrides are removed and the base default of 2 silently kicks back in
- Settings that are unsafe for your workload (e.g. inbound raised to 8 on a new node while outbound stays at 2 on the source, leaving the bottleneck unchanged; tuning during incident response without staging validation)
- The downstream operational impact: per-node inbound recovery count from
_cat/recovery, destination disk write bandwidth, and the search latency cost on the receiving node
When a scale-out adds a new node and it cannot pull shards in fast enough, Pulse names whether the inbound cap, outbound cap, or per-recovery bandwidth is the binding constraint.
Frequently Asked Questions
Q: What is the default for cluster.routing.allocation.node_concurrent_incoming_recoveries?
A: The default is 2. This caps the number of shard recoveries the node can be the destination of at one time. When unset, the same value is inherited from cluster.routing.allocation.node_concurrent_recoveries.
Q: How is this different from node_concurrent_recoveries?
A: node_concurrent_recoveries is the fallback for both inbound and outbound. node_concurrent_incoming_recoveries is specific to recoveries this node is receiving. If both are set, the directional value wins for its direction.
Q: Will raising this setting speed up adding a new node?
A: Often yes, on hardware that can absorb the extra parallelism. New nodes start empty and pull many shards in, so the inbound limit is the binding constraint. Raise to 4-8 on NVMe and fast networks; keep at 2 on slower storage.
Q: Does this setting affect snapshot restore?
A: Yes. Snapshot restore performs shard recovery on the destination, and the inbound limit applies. Restore parallelism is also bounded by repository throughput, which is usually the dominant constraint.
Q: Can I change cluster.routing.allocation.node_concurrent_incoming_recoveries dynamically?
A: Yes. Use PUT /_cluster/settings and the change applies at the next allocator pass. Active recoveries are not interrupted; the new limit governs the next batch.
Q: What happens if I raise this too high?
A: Disk and network saturate, and search and indexing latency spike. Cluster recovery time may not improve - or may actually get worse - if the destination is I/O bound. Raise in small steps and measure.
Q: What's the best tool to tune inbound recovery parallelism when scaling out?
A: Pulse is built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks node_concurrent_incoming_recoveries against actual destination disk and network throughput, identifies whether the bottleneck is the inbound cap, the outbound cap, or indices.recovery.max_bytes_per_sec, and recommends a change that is safe for the cluster's current hardware.
Related Reading
- Elasticsearch cluster.routing.allocation.node_concurrent_recoveries
- Elasticsearch cluster.routing.allocation.enable Setting
- Elasticsearch cluster.routing.rebalance.enable Setting
- Elasticsearch cluster.routing.allocation.disk.watermark.high
- Elasticsearch Cluster Status Yellow
- Elasticsearch Shard Sizing Best Practices