NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch cluster.routing.allocation.disk.watermark.high: 90% Default and Shard Relocation

The cluster.routing.allocation.disk.watermark.high setting is the second of Elasticsearch's three disk watermarks. It defaults to 90% of disk usage. Once a data node crosses this threshold, Elasticsearch actively tries to move shards off the node, on top of the new-allocation block already applied by the low watermark at 85%. The high watermark is the cluster's automatic mitigation step: if you ignore the warning, flood stage at 95% will block writes.

Definition

cluster.routing.allocation.disk.watermark.high is a dynamic cluster-level setting that controls when Elasticsearch starts evacuating shards from over-utilised data nodes. Like the other watermarks, it can be expressed as a percentage of disk used or as an absolute amount of free disk space remaining. The master checks node disk usage every cluster.info.update.interval (default 30 seconds).

Default and Allowed Values

Property Value
Default 90%
Type Percentage (90%) or absolute free space (100gb, 500gb)
Scope Cluster, dynamic
Effect Triggers shard relocation away from the over-watermark node
Required ordering Must satisfy low <= high <= flood_stage

The setting respects whichever form (percentage vs absolute) is in use. You cannot mix percentage and absolute forms across the three watermark settings without Elasticsearch complaining.

How to Change It

Through the cluster settings API:

# Lower the high watermark to 85% to start relocation earlier
PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.high": "85%"
  }
}

Or with absolute free space:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.high": "150gb"
  }
}

Reset to default with null. For static deployment, the same keys can be set in elasticsearch.yml, but the dynamic API is preferred so on-call engineers can adjust without restarts during a disk pressure incident.

If you are tuning all three watermarks together, set them in one call so the ordering check passes:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "80%",
    "cluster.routing.allocation.disk.watermark.high": "85%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "92%"
  }
}

Operational Impact

When a node crosses the high watermark:

  • The master schedules shards on that node for relocation to other data nodes with room.
  • The relocation is rate-limited by cluster.routing.allocation.node_concurrent_recoveries (default 2) per node.
  • The over-watermark node logs WARN-level high disk watermark exceeded on [node] shards will be relocated away messages.
  • New shard allocation to the node remains blocked (inherited from the low watermark).

Relocation itself consumes disk and network throughput. If the cluster has nowhere to send the shards (every other node is also above the high watermark, or shard allocation filters exclude the only candidates), shards stay unassigned and cluster health goes yellow. If the node continues to fill, flood stage at 95% will then block writes to every index with a shard on it.

A common operational anti-pattern is to raise the high watermark to silence relocation churn during a capacity squeeze. That removes the safety valve - the next push past 95% triggers flood stage and the read-only block, which costs more downtime to recover from than the relocation would have.

Common Mistakes

  1. Raising the watermark during an incident to stop relocation churn. This delays flood stage, which is far more disruptive.
  2. Setting low, high, and flood_stage too close. A 1% gap gives no time for relocation to actually drain a node.
  3. Mixing percentage and absolute values. Pick one form for all three watermarks.
  4. Forgetting about disk used by Lucene merges and translog. A node at 90% disk used can still need several GB of headroom for merges; relocation needs that headroom on the destination too.
  5. Ignoring cluster.routing.allocation.node_concurrent_recoveries. Default 2 may be too low for fast SSDs and a tight relocation deadline.

Catch the 90% High Watermark Before It Triggers Relocation Churn with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch. When a data node's disk usage crosses cluster.routing.allocation.disk.watermark.high (default 90%) and the cluster starts evacuating shards in your environment, Pulse:

  • Continuously tracks per-node disk.percent from _cat/allocation, free-space deltas, and the WARN log signal high disk watermark exceeded on [node] shards will be relocated away
  • Correlates the watermark trip with active shard relocations, node_concurrent_recoveries throughput, ILM rollover state, snapshot age, and headroom on candidate destination nodes
  • Identifies why the node crossed 90% - missing delete phase in ILM, oversharded index growth, a stuck force-merge consuming disk for Lucene segments, or simple capacity exhaustion
  • Recommends the precise fix - run rollover, delete or shrink an over-retained index, raise node_concurrent_recoveries on fast SSDs, or add a data node before flood stage at 95% triggers the read-only block
  • Applies low-risk fixes automatically with your approval (clearing stale indices past their retention) or generates a one-click cluster settings PR

Pulse turns the manual watermark triage above into an agentic SRE workflow that intervenes between the 85% low watermark warning and the 95% flood-stage write block. Start a free trial.

Frequently Asked Questions

Q: What is the fastest way to diagnose a node crossing the 90% high disk watermark in production?
A: Check GET /_cat/allocation?v for the disk.percent column and _cat/recovery?v&active_only=true for shard relocations the master scheduled in response. For continuous coverage, Pulse acts as an AI DBA for Elasticsearch and OpenSearch that tracks per-node disk telemetry against the high watermark, correlates the trip with ILM state and shard movement, and recommends rollover or capacity changes before flood stage at 95% blocks writes.

Q: What is the default value of cluster.routing.allocation.disk.watermark.high?
A: The default is 90%. When a data node's disk usage crosses 90%, Elasticsearch starts relocating shards off that node to other data nodes that have room.

Q: How is the high watermark different from the low watermark?
A: The low watermark (85%) only blocks new shard allocation to the affected node. The high watermark (90%) also actively relocates existing shards away from the node. Flood stage (95%) is the final step and blocks writes entirely.

Q: Can I change the high watermark on a running cluster?
A: Yes. It is a dynamic cluster setting. Use PUT /_cluster/settings and the change takes effect at the next disk-info update (every 30 seconds by default). You do not need to restart any node.

Q: What happens if every node exceeds the high watermark at once?
A: Elasticsearch has nowhere to relocate shards to. Shards stay where they are, the cluster logs warnings, and any new index creation hangs or fails because no node is eligible. Cluster status drops to yellow or red depending on replicas. The fix is to add capacity or delete data.

Q: Does the high watermark block writes?
A: No. The high watermark only triggers shard relocation. Writes continue to indices whose shards are on the affected node. Writes are blocked by flood stage (95% by default), not by the high watermark.

Q: Should I use percentage or absolute free-space values?
A: Percentages work well on homogeneous clusters. Absolute free-space values are more predictable on large disks. On a 10 TB volume the 5% gap between 90% and 95% is 500 GB - more than enough headroom, so an absolute value like 100gb can be more reasonable.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.