Elasticsearch DiskUsageException: Disk usage exceeded threshold - Common Causes & Fixes

Brief Explanation

The "DiskUsageException: Disk usage exceeded threshold" error occurs in Elasticsearch when the available disk space on one or more nodes in the cluster falls below a configured threshold. This error is a protective measure to prevent data loss and ensure cluster stability.

Impact

This error can have significant impacts on your Elasticsearch cluster:

  • New document indexing may be blocked
  • Shard allocation and rebalancing may be prevented
  • Search performance may degrade
  • In severe cases, the affected node(s) may become unresponsive

Common Causes

  1. Insufficient disk space allocation
  2. Rapid data growth without proper capacity planning
  3. Large volumes of logs or temporary files
  4. Improper index lifecycle management
  5. Unoptimized index settings or mappings

Troubleshooting and Resolution Steps

  1. Identify affected nodes:

    • Use the GET /_cat/allocation?v API to check disk usage across nodes
  2. Increase available disk space:

    • Delete unnecessary indices or old data
    • Optimize existing indices using the _forcemerge API
    • Increase disk capacity if possible
  3. Adjust disk threshold settings:

    • Temporarily increase the cluster.routing.allocation.disk.threshold_enabled setting
    • Modify cluster.routing.allocation.disk.watermark.low, high, and flood_stage settings
  4. Implement proper index lifecycle management:

    • Use Index Lifecycle Management (ILM) to automate index rollovers and deletions
  5. Optimize index settings and mappings:

    • Review and optimize mapping to reduce storage requirements
    • Adjust refresh intervals and merge policies
  6. Monitor and plan for capacity:

    • Set up alerts for disk usage
    • Implement regular capacity planning reviews

Best Practices

  • Regularly monitor disk usage and set up alerts
  • Implement proper data retention policies
  • Use ILM to manage index lifecycles automatically
  • Optimize mappings and index settings for storage efficiency
  • Plan for scalability and add nodes or increase disk capacity proactively

Frequently Asked Questions

Q: How can I quickly free up disk space in an emergency?
A: You can delete old or unnecessary indices using the DELETE /<index_name> API. Be cautious and ensure you're not deleting critical data. You can also force a merge on existing indices to reclaim deleted document space using the POST /<index_name>/_forcemerge API.

Q: What are the default disk usage thresholds in Elasticsearch?
A: By default, the low watermark is 85%, the high watermark is 90%, and the flood stage is 95% of disk usage. These can be adjusted in the cluster settings.

Q: Can I disable disk-based shard allocation entirely?
A: While possible by setting cluster.routing.allocation.disk.threshold_enabled to false, it's not recommended as it can lead to data loss if a node runs out of disk space entirely.

Q: How does Elasticsearch behave when disk usage exceeds the flood stage?
A: When disk usage exceeds the flood stage watermark, Elasticsearch enforces a read-only index block on every index that has at least one shard allocated on the affected node.

Q: Is it safe to delete the Elasticsearch data directory to free up space?
A: No, never delete the Elasticsearch data directory directly. This will lead to data loss. Always use Elasticsearch APIs to manage data and indices.

Pulse - Elasticsearch Operations Done Right
Free Health Assessment

Need more help with your cluster?

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.