Elasticsearch DiskUsageException: Disk usage exceeded threshold

Brief Explanation

The "DiskUsageException: Disk usage exceeded threshold" error occurs in Elasticsearch when the available disk space on one or more nodes in the cluster falls below a configured threshold. This error is a protective measure to prevent data loss and ensure cluster stability.

Impact

This error can have significant impacts on your Elasticsearch cluster:

New document indexing may be blocked
Shard allocation and rebalancing may be prevented
Search performance may degrade
In severe cases, the affected node(s) may become unresponsive

Common Causes

Insufficient disk space allocation
Rapid data growth without proper capacity planning
Large volumes of logs or temporary files
Improper index lifecycle management
Unoptimized index settings or mappings

Troubleshooting and Resolution Steps

Identify affected nodes:
- Use the GET /_cat/allocation?v API to check disk usage across nodes
Increase available disk space:
- Delete unnecessary indices or old data
- Optimize existing indices using the _forcemerge API
- Increase disk capacity if possible
Adjust disk threshold settings:
- Temporarily increase the cluster.routing.allocation.disk.threshold_enabled setting
- Modify cluster.routing.allocation.disk.watermark.low, high, and flood_stage settings
Implement proper index lifecycle management:
- Use Index Lifecycle Management (ILM) to automate index rollovers and deletions
Optimize index settings and mappings:
- Review and optimize mapping to reduce storage requirements
- Adjust refresh intervals and merge policies
Monitor and plan for capacity:
- Set up alerts for disk usage
- Implement regular capacity planning reviews

Best Practices

Regularly monitor disk usage and set up alerts
Implement proper data retention policies
Use ILM to manage index lifecycles automatically
Optimize mappings and index settings for storage efficiency
Plan for scalability and add nodes or increase disk capacity proactively

Frequently Asked Questions

Q: How can I quickly free up disk space in an emergency?
A: You can delete old or unnecessary indices using the DELETE /<index_name> API. Be cautious and ensure you're not deleting critical data. You can also force a merge on existing indices to reclaim deleted document space using the POST /<index_name>/_forcemerge API.

Q: What are the default disk usage thresholds in Elasticsearch?
A: By default, the low watermark is 85%, the high watermark is 90%, and the flood stage is 95% of disk usage. These can be adjusted in the cluster settings.

Q: Can I disable disk-based shard allocation entirely?
A: While possible by setting cluster.routing.allocation.disk.threshold_enabled to false, it's not recommended as it can lead to data loss if a node runs out of disk space entirely.

Q: How does Elasticsearch behave when disk usage exceeds the flood stage?
A: When disk usage exceeds the flood stage watermark, Elasticsearch enforces a read-only index block on every index that has at least one shard allocated on the affected node.

Q: Is it safe to delete the Elasticsearch data directory to free up space?
A: No, never delete the Elasticsearch data directory directly. This will lead to data loss. Always use Elasticsearch APIs to manage data and indices.