indices.recovery.max_bytes_per_sec caps the bandwidth used by peer recovery on each node, applied per-node both to outbound recovery sends and inbound receives. The setting throttles the file-copy phase of shard recovery, snapshot restore, relocation, and replica synchronization. The default value protects ongoing search and indexing traffic from being starved by a large recovery event.
- Default:
40mbper node (since Elasticsearch 7.x). On nodes with thedata_coldordata_frozenroles, defaults differ - Scope: Cluster-wide dynamic setting, but limit applies per node
- Possible values: Any size value, e.g.
100mb,1gb,0(no throttling) - Special value:
0disables throttling entirely
How Recovery Throttling Works
Peer recovery copies segment files from a source node to a target node. The throttle is enforced on the source side: bytes-per-second across all concurrent recoveries leaving a node are clamped to the limit. Inbound recoveries are similarly throttled on the receiving node.
On a 10 GbE cluster network (~1.25 GB/s theoretical), the default 40 MB/s leaves ~97% of network capacity for live traffic. On a 1 GbE cluster, the same default consumes ~32% of the link - usually still tolerable. Tuning the value is mostly a question of how much network you're willing to give up during recoveries.
Configuring indices.recovery.max_bytes_per_sec
Change it via the cluster settings API at runtime:
PUT /_cluster/settings
{
"persistent": {
"indices.recovery.max_bytes_per_sec": "200mb"
}
}
Setting it to null reverts to the default. The change applies immediately to new and in-progress recoveries.
When to Adjust the Default
| Scenario | Recommended setting |
|---|---|
| Default 1-10 GbE cluster, normal recoveries | Leave at 40mb |
| Large cluster after node loss, recovery taking hours | Temporarily raise to 200mb-500mb, restore after |
| Dedicated recovery network or quiet maintenance window | 500mb-1gb or disable (0) |
| Recovery is causing user-visible search latency | Lower to 20mb |
| Cold/frozen-tier nodes restoring from snapshot | Default is higher already; tune snapshot repo throughput first |
Operators commonly raise this setting during a planned recovery (rolling restart, hardware replacement) and revert afterwards. Permanent high values risk crowding out live traffic when an unplanned recovery starts.
Related Recovery Settings
| Setting | Default | Purpose |
|---|---|---|
indices.recovery.max_concurrent_file_chunks |
2 | Parallel file chunks per recovery |
indices.recovery.max_concurrent_operations |
1 | Parallel operations during translog replay |
cluster.routing.allocation.node_concurrent_recoveries |
2 | Concurrent recoveries per node |
cluster.routing.allocation.cluster_concurrent_rebalance |
2 | Concurrent rebalance moves cluster-wide |
Recovery throughput is shaped by all of these together. Raising max_bytes_per_sec alone has diminishing returns if node_concurrent_recoveries stays at 2.
Common Pitfalls
- Disabling throttling (
0) on a busy cluster. A large recovery can saturate the network and stall searches. - Raising only
max_bytes_per_secand notnode_concurrent_recoveriesormax_concurrent_file_chunks. The bandwidth budget is per-recovery-chunk, so concurrency dictates whether the limit is reached. - Forgetting to revert temporary changes. A raised value left in place silently impacts performance when the next unexpected recovery starts.
- Ignoring the actual bottleneck. If the snapshot repository S3 bucket is throttling, raising recovery bandwidth changes nothing.
Monitoring Shard Recovery
Inspect active recoveries:
GET /_cat/recovery?v&active_only=true&h=index,shard,source_node,target_node,stage,bytes_recovered,bytes_total,bytes_percent
Prevent Recovery Throttle Misconfiguration with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that tracks indices.recovery.max_bytes_per_sec (default 40 MB/s) and the related throttles - node_concurrent_recoveries (default 2), max_concurrent_file_chunks (default 2), cluster_concurrent_rebalance (default 2) - across the cluster, flagging:
- Drift between intended values and what is actually applied (a temporary 500 MB/s raised for a planned recovery and never reverted)
- Settings that are unsafe for your workload (e.g. throttle disabled with
0on a production cluster, ormax_bytes_per_secraised without raisingnode_concurrent_recoveriesso the bandwidth budget never gets used) - The downstream operational impact: time-to-green after node loss, recovery throughput, and search latency p95 during active recoveries
When recovery traffic starts impacting users, Pulse names the right knob to turn and recommends a temporary throttle adjustment with an automatic revert plan.
Frequently Asked Questions
Q: What is the default value of indices.recovery.max_bytes_per_sec?
A: The default is 40mb per node in modern Elasticsearch (7.x and later). Earlier versions used different defaults. Cold-tier and frozen-tier nodes have different per-role defaults.
Q: Can I disable recovery throttling entirely?
A: Yes, set indices.recovery.max_bytes_per_sec to 0 to remove the cap. Use this only during planned maintenance windows where no live traffic is at risk - an unthrottled recovery can saturate the cluster network.
Q: Why is my recovery slower than indices.recovery.max_bytes_per_sec?
A: The throttle is the upper bound, not the target. Real throughput is limited by node_concurrent_recoveries, max_concurrent_file_chunks, source/target disk IOPS, and the network. Inspect _cat/recovery to find where the time is being spent.
Q: Does indices.recovery.max_bytes_per_sec affect snapshot restore?
A: Yes. The setting throttles the file-copy phase of recoveries, which includes snapshot restore. For S3-backed repositories, also check max_restore_bytes_per_sec on the repository configuration.
Q: Is indices.recovery.max_bytes_per_sec a per-shard limit?
A: No, it's per-node. All concurrent recoveries leaving (or arriving at) a node share the bandwidth budget. Concurrent recovery count is controlled separately.
Q: Can I set indices.recovery.max_bytes_per_sec per index?
A: No, it's a cluster-level setting and applies to all recoveries on all nodes. There is no per-index recovery throttle.
Q: What's the best tool to tune Elasticsearch shard recovery without crowding out live traffic?
A: Pulse is purpose-built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks indices.recovery.max_bytes_per_sec and the related concurrency throttles, correlates recovery throughput with search latency p95, and recommends time-bound throttle changes that revert automatically when recovery completes.
Related Reading
- Elasticsearch Allocation Explain API: Investigate why shards aren't recovering
- Elasticsearch Disk I/O Bottleneck Troubleshooting: IO-bound recovery
- Elasticsearch Using Searchable Snapshots: Snapshot-backed recovery
- Elasticsearch Rolling Restart Problems: Recovery during planned restarts
- Elasticsearch Settings: Full settings reference