Elasticsearch indices.recovery.max_bytes_per_sec Setting

indices.recovery.max_bytes_per_sec caps the bandwidth used by peer recovery on each node, applied per-node both to outbound recovery sends and inbound receives. The setting throttles the file-copy phase of shard recovery, snapshot restore, relocation, and replica synchronization. The default value protects ongoing search and indexing traffic from being starved by a large recovery event.

  • Default: 40mb per node (since Elasticsearch 7.x). On nodes with the data_cold or data_frozen roles, defaults differ
  • Scope: Cluster-wide dynamic setting, but limit applies per node
  • Possible values: Any size value, e.g. 100mb, 1gb, 0 (no throttling)
  • Special value: 0 disables throttling entirely

How Recovery Throttling Works

Peer recovery copies segment files from a source node to a target node. The throttle is enforced on the source side: bytes-per-second across all concurrent recoveries leaving a node are clamped to the limit. Inbound recoveries are similarly throttled on the receiving node.

On a 10 GbE cluster network (~1.25 GB/s theoretical), the default 40 MB/s leaves ~97% of network capacity for live traffic. On a 1 GbE cluster, the same default consumes ~32% of the link - usually still tolerable. Tuning the value is mostly a question of how much network you're willing to give up during recoveries.

Configuring indices.recovery.max_bytes_per_sec

Change it via the cluster settings API at runtime:

PUT /_cluster/settings
{
  "persistent": {
    "indices.recovery.max_bytes_per_sec": "200mb"
  }
}

Setting it to null reverts to the default. The change applies immediately to new and in-progress recoveries.

When to Adjust the Default

Scenario Recommended setting
Default 1-10 GbE cluster, normal recoveries Leave at 40mb
Large cluster after node loss, recovery taking hours Temporarily raise to 200mb-500mb, restore after
Dedicated recovery network or quiet maintenance window 500mb-1gb or disable (0)
Recovery is causing user-visible search latency Lower to 20mb
Cold/frozen-tier nodes restoring from snapshot Default is higher already; tune snapshot repo throughput first

Operators commonly raise this setting during a planned recovery (rolling restart, hardware replacement) and revert afterwards. Permanent high values risk crowding out live traffic when an unplanned recovery starts.

Setting Default Purpose
indices.recovery.max_concurrent_file_chunks 2 Parallel file chunks per recovery
indices.recovery.max_concurrent_operations 1 Parallel operations during translog replay
cluster.routing.allocation.node_concurrent_recoveries 2 Concurrent recoveries per node
cluster.routing.allocation.cluster_concurrent_rebalance 2 Concurrent rebalance moves cluster-wide

Recovery throughput is shaped by all of these together. Raising max_bytes_per_sec alone has diminishing returns if node_concurrent_recoveries stays at 2.

Common Pitfalls

  1. Disabling throttling (0) on a busy cluster. A large recovery can saturate the network and stall searches.
  2. Raising only max_bytes_per_sec and not node_concurrent_recoveries or max_concurrent_file_chunks. The bandwidth budget is per-recovery-chunk, so concurrency dictates whether the limit is reached.
  3. Forgetting to revert temporary changes. A raised value left in place silently impacts performance when the next unexpected recovery starts.
  4. Ignoring the actual bottleneck. If the snapshot repository S3 bucket is throttling, raising recovery bandwidth changes nothing.

Monitoring Shard Recovery

Inspect active recoveries:

GET /_cat/recovery?v&active_only=true&h=index,shard,source_node,target_node,stage,bytes_recovered,bytes_total,bytes_percent

Prevent Recovery Throttle Misconfiguration with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that tracks indices.recovery.max_bytes_per_sec (default 40 MB/s) and the related throttles - node_concurrent_recoveries (default 2), max_concurrent_file_chunks (default 2), cluster_concurrent_rebalance (default 2) - across the cluster, flagging:

  • Drift between intended values and what is actually applied (a temporary 500 MB/s raised for a planned recovery and never reverted)
  • Settings that are unsafe for your workload (e.g. throttle disabled with 0 on a production cluster, or max_bytes_per_sec raised without raising node_concurrent_recoveries so the bandwidth budget never gets used)
  • The downstream operational impact: time-to-green after node loss, recovery throughput, and search latency p95 during active recoveries

When recovery traffic starts impacting users, Pulse names the right knob to turn and recommends a temporary throttle adjustment with an automatic revert plan.

Connect your cluster.

Frequently Asked Questions

Q: What is the default value of indices.recovery.max_bytes_per_sec?
A: The default is 40mb per node in modern Elasticsearch (7.x and later). Earlier versions used different defaults. Cold-tier and frozen-tier nodes have different per-role defaults.

Q: Can I disable recovery throttling entirely?
A: Yes, set indices.recovery.max_bytes_per_sec to 0 to remove the cap. Use this only during planned maintenance windows where no live traffic is at risk - an unthrottled recovery can saturate the cluster network.

Q: Why is my recovery slower than indices.recovery.max_bytes_per_sec?
A: The throttle is the upper bound, not the target. Real throughput is limited by node_concurrent_recoveries, max_concurrent_file_chunks, source/target disk IOPS, and the network. Inspect _cat/recovery to find where the time is being spent.

Q: Does indices.recovery.max_bytes_per_sec affect snapshot restore?
A: Yes. The setting throttles the file-copy phase of recoveries, which includes snapshot restore. For S3-backed repositories, also check max_restore_bytes_per_sec on the repository configuration.

Q: Is indices.recovery.max_bytes_per_sec a per-shard limit?
A: No, it's per-node. All concurrent recoveries leaving (or arriving at) a node share the bandwidth budget. Concurrent recovery count is controlled separately.

Q: Can I set indices.recovery.max_bytes_per_sec per index?
A: No, it's a cluster-level setting and applies to all recoveries on all nodes. There is no per-index recovery throttle.

Q: What's the best tool to tune Elasticsearch shard recovery without crowding out live traffic?
A: Pulse is purpose-built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks indices.recovery.max_bytes_per_sec and the related concurrency throttles, correlates recovery throughput with search latency p95, and recommends time-bound throttle changes that revert automatically when recovery completes.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.