Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Cluster Block Read Only Low Disk Watermark

When Elasticsearch disk usage exceeds configured watermark thresholds, it automatically blocks write operations to prevent data loss. This guide explains how to recover from read-only blocks and prevent future occurrences.

Understanding Disk Watermarks

Default Watermark Thresholds

Watermark Default Effect
Low 85% No new shards allocated to node
High 90% Shards relocated away from node
Flood Stage 95% Index becomes read-only

Error Messages

When flood stage is reached:

ClusterBlockException: index [my-index] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark...];

Or:

index read-only / allow delete (api)

Diagnosing the Issue

Check Disk Usage

GET /_cat/allocation?v&h=node,disk.percent,disk.used,disk.avail,disk.total

Check Index Block Status

GET /_settings?filter_path=*.settings.index.blocks

Check Cluster Settings

GET /_cluster/settings?include_defaults=true&filter_path=*.cluster.routing.allocation.disk*

Immediate Recovery Steps

Step 1: Remove Read-Only Block

PUT /_all/_settings
{
  "index.blocks.read_only_allow_delete": null
}

For specific indices:

PUT /my-index/_settings
{
  "index.blocks.read_only_allow_delete": null
}

Step 2: Free Up Disk Space

Options to quickly free space:

Delete old indices:

DELETE /old-index-2023-01-*

Force merge and delete old segments:

POST /my-index/_forcemerge?only_expunge_deletes=true

Delete unnecessary replicas temporarily:

PUT /my-index/_settings
{
  "index.number_of_replicas": 0
}

Step 3: Temporarily Adjust Watermarks

If you can't immediately free space:

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "98%"
  }
}

Warning: This is temporary! Plan to add storage or clean data.

Step 4: Verify Recovery

GET /_cluster/health
GET /_cat/allocation?v

Long-Term Solutions

Solution 1: Add More Disk Space

The most sustainable solution:

  • Expand existing volumes
  • Add new data nodes
  • Use larger instance types (cloud)

Solution 2: Implement Index Lifecycle Management

PUT _ilm/policy/disk_management
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          },
          "forcemerge": {
            "max_num_segments": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Solution 3: Configure Proper Watermarks

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "85%",
    "cluster.routing.allocation.disk.watermark.high": "90%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
    "cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "20GB"
  }
}

Solution 4: Use Absolute Values for Large Disks

For very large disks, use absolute values:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.low": "500gb",
    "cluster.routing.allocation.disk.watermark.high": "200gb",
    "cluster.routing.allocation.disk.watermark.flood_stage": "100gb"
  }
}

Solution 5: Automatic Read-Only Removal (ES 7.4+)

Enable automatic removal when disk frees up:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.enable_for_single_data_node": true
  }
}

Preventing Future Issues

Set Up Monitoring

Alert when disk usage approaches thresholds:

GET /_cat/allocation?v&h=node,disk.percent

Alert thresholds:

  • Warning: 75%
  • Critical: 80%
  • Emergency: 85%

Implement Data Retention

  • Use ILM for automatic index deletion
  • Review retention policies regularly
  • Archive old data to cheaper storage

Capacity Planning

  • Track disk growth rate
  • Plan for data growth + headroom
  • Schedule regular capacity reviews

Quick Recovery Script

#!/bin/bash
# recover-from-readonly.sh

# Remove read-only blocks
curl -X PUT "localhost:9200/_all/_settings" -H 'Content-Type: application/json' -d'
{
  "index.blocks.read_only_allow_delete": null
}
'

# Check cluster health
curl -X GET "localhost:9200/_cluster/health?pretty"

# Check disk allocation
curl -X GET "localhost:9200/_cat/allocation?v"

Understanding Watermark Behavior

Allocation Decisions

  • Below low watermark: Normal operation
  • Between low and high: New shards not allocated to node
  • Between high and flood: Shards relocated away
  • Above flood stage: Indices become read-only

Refresh Interval

Elasticsearch checks disk usage periodically:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.info.update.interval": "30s"
  }
}

Single-Node Clusters

For single-node clusters, behavior is different:

PUT /_cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.disk.watermark.enable_for_single_data_node": true
  }
}

Troubleshooting Checklist

When encountering read-only errors:

  • Check current disk usage: GET /_cat/allocation?v
  • Identify which indices are blocked
  • Remove read-only block
  • Free up disk space or add capacity
  • Verify writes are working
  • Adjust watermarks if needed
  • Set up monitoring to prevent recurrence
  • Review ILM policies
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.