Elasticsearch Cluster Block Read Only Low Disk Watermark

When Elasticsearch disk usage exceeds configured watermark thresholds, it automatically blocks write operations to prevent data loss. This guide explains how to recover from read-only blocks and prevent future occurrences.

Understanding Disk Watermarks

Default Watermark Thresholds

Watermark	Default	Effect
Low	85%	No new shards allocated to node
High	90%	Shards relocated away from node
Flood Stage	95%	Index becomes read-only

Error Messages

When flood stage is reached:

ClusterBlockException: index [my-index] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark...];

Or:

index read-only / allow delete (api)

Diagnosing the Issue

Check Disk Usage

GET /_cat/allocation?v&h=node,disk.percent,disk.used,disk.avail,disk.total

Check Index Block Status

GET /_settings?filter_path=*.settings.index.blocks

Check Cluster Settings

GET /_cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.disk*

Immediate Recovery Steps

Step 1: Remove Read-Only Block

PUT /_all/_settings
{
  "index.blocks.read_only_allow_delete": null
}

For specific indices:

PUT /my-index/_settings
{
  "index.blocks.read_only_allow_delete": null
}

Step 2: Free Up Disk Space

Options to quickly free space:

Delete old indices:

DELETE /old-index-2023-01-*

Force merge and delete old segments:

POST /my-index/_forcemerge?only_expunge_deletes=true

Delete unnecessary replicas temporarily:

PUT /my-index/_settings
{
  "index.number_of_replicas": 0
}

Step 3: Temporarily Adjust Watermarks

If you can't immediately free space:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "98%"
  }
}

Warning: This is temporary! Plan to add storage or clean data.

Step 4: Verify Recovery

GET /_cluster/health
GET /_cat/allocation?v

Long-Term Solutions

Solution 1: Add More Disk Space

The most sustainable solution:

Expand existing volumes
Add new data nodes
Use larger instance types (cloud)

Solution 2: Implement Index Lifecycle Management

PUT _ilm/policy/disk_management
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Solution 3: Configure Proper Watermarks

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "20GB"
  }
}

Solution 4: Use Absolute Values for Large Disks

For very large disks, use absolute values:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "500gb",
    "cluster.routing.allocation.disk.watermark.high": "200gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "100gb"
  }
}

Solution 5: Automatic Read-Only Removal (ES 7.4+)

Enable automatic removal when disk frees up:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.enable_for_single_data_node": true
  }
}

Preventing Future Issues

Set Up Monitoring

Alert when disk usage approaches thresholds:

GET /_cat/allocation?v&h=node,disk.percent

Alert thresholds:

Warning: 75%
Critical: 80%
Emergency: 85%

Implement Data Retention

Use ILM for automatic index deletion
Review retention policies regularly
Archive old data to cheaper storage

Capacity Planning

Track disk growth rate
Plan for data growth + headroom
Schedule regular capacity reviews

Quick Recovery Script

#!/bin/bash
# recover-from-readonly.sh

# Remove read-only blocks
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
  "index.blocks.read_only_allow_delete": null
}
'

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check disk allocation
curl -X GET "localhost:9200/_cat/allocation?v"

Understanding Watermark Behavior

Allocation Decisions

Below low watermark: Normal operation
Between low and high: New shards not allocated to node
Between high and flood: Shards relocated away
Above flood stage: Indices become read-only

Refresh Interval

Elasticsearch checks disk usage periodically:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.info.update.interval": "30s"
  }
}

Single-Node Clusters

For single-node clusters, behavior is different:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.enable_for_single_data_node": true
  }
}

Troubleshooting Checklist

When encountering read-only errors:

Check current disk usage: GET /_cat/allocation?v
Identify which indices are blocked
Remove read-only block
Free up disk space or add capacity
Verify writes are working
Adjust watermarks if needed
Set up monitoring to prevent recurrence
Review ILM policies