Brief Explanation
The "DiskUsageException: Disk usage exceeded threshold" error occurs in Elasticsearch when the available disk space on one or more nodes in the cluster falls below a configured threshold. This error is a protective measure to prevent data loss and ensure cluster stability.
Impact
This error can have significant impacts on your Elasticsearch cluster:
- New document indexing may be blocked
- Shard allocation and rebalancing may be prevented
- Search performance may degrade
- In severe cases, the affected node(s) may become unresponsive
Common Causes
- Insufficient disk space allocation
- Rapid data growth without proper capacity planning
- Large volumes of logs or temporary files
- Improper index lifecycle management
- Unoptimized index settings or mappings
Troubleshooting and Resolution Steps
Identify affected nodes:
- Use the
GET /_cat/allocation?v
API to check disk usage across nodes
- Use the
Increase available disk space:
- Delete unnecessary indices or old data
- Optimize existing indices using the
_forcemerge
API - Increase disk capacity if possible
Adjust disk threshold settings:
- Temporarily increase the
cluster.routing.allocation.disk.threshold_enabled
setting - Modify
cluster.routing.allocation.disk.watermark.low
,high
, andflood_stage
settings
- Temporarily increase the
Implement proper index lifecycle management:
- Use Index Lifecycle Management (ILM) to automate index rollovers and deletions
Optimize index settings and mappings:
- Review and optimize mapping to reduce storage requirements
- Adjust refresh intervals and merge policies
Monitor and plan for capacity:
- Set up alerts for disk usage
- Implement regular capacity planning reviews
Best Practices
- Regularly monitor disk usage and set up alerts
- Implement proper data retention policies
- Use ILM to manage index lifecycles automatically
- Optimize mappings and index settings for storage efficiency
- Plan for scalability and add nodes or increase disk capacity proactively
Frequently Asked Questions
Q: How can I quickly free up disk space in an emergency?
A: You can delete old or unnecessary indices using the DELETE /<index_name>
API. Be cautious and ensure you're not deleting critical data. You can also force a merge on existing indices to reclaim deleted document space using the POST /<index_name>/_forcemerge
API.
Q: What are the default disk usage thresholds in Elasticsearch?
A: By default, the low watermark is 85%, the high watermark is 90%, and the flood stage is 95% of disk usage. These can be adjusted in the cluster settings.
Q: Can I disable disk-based shard allocation entirely?
A: While possible by setting cluster.routing.allocation.disk.threshold_enabled
to false
, it's not recommended as it can lead to data loss if a node runs out of disk space entirely.
Q: How does Elasticsearch behave when disk usage exceeds the flood stage?
A: When disk usage exceeds the flood stage watermark, Elasticsearch enforces a read-only index block on every index that has at least one shard allocated on the affected node.
Q: Is it safe to delete the Elasticsearch data directory to free up space?
A: No, never delete the Elasticsearch data directory directly. This will lead to data loss. Always use Elasticsearch APIs to manage data and indices.