High JVM heap pressure in Elasticsearch occurs when memory usage consistently exceeds safe thresholds. When heap usage rises above 85%, immediate action is required to prevent cluster instability, degraded performance, and potential OutOfMemory errors.
Understanding Heap Pressure Thresholds
| Heap Usage | Status | Action Required |
|---|---|---|
| < 75% | Healthy | Normal operation |
| 75-85% | Warning | Monitor closely, consider optimization |
| > 85% | Critical | Immediate action required |
| > 95% | Emergency | Circuit breakers will trigger |
Checking Current Heap Pressure
Using the Nodes Stats API
GET /_nodes/stats/jvm
Look for:
jvm.mem.heap_used_percent- current heap usage percentagejvm.mem.heap_used_in_bytes- actual bytes usedjvm.gc- garbage collection statistics
Using the Cat Nodes API
Quick overview of heap usage across all nodes:
GET /_cat/nodes?v&h=name,heap.percent,heap.current,heap.max,cpu,load_1m
Calculating Memory Pressure
For detailed old generation pool analysis:
GET /_nodes/stats?filter_path=nodes.*.jvm.mem.pools.old
Memory pressure is calculated as: (used_in_bytes / max_in_bytes) * 100
Root Causes of High Heap Pressure
1. Too Many Shards
Every shard consumes heap memory for metadata and segment information. Excessive shards are the most common cause of sustained high heap pressure.
Diagnosis:
GET /_cat/shards?v | wc -l
GET /_cluster/stats?filter_path=indices.shards
Solution: Aim for fewer, larger shards (10-50 GB each).
2. Large Aggregations
Aggregations with high cardinality or large bucket sizes consume significant heap.
Solution:
- Reduce
sizeparameter in aggregations - Use
compositeaggregation for high-cardinality fields - Avoid aggregating on
textfields (usekeywordinstead)
3. Fielddata Usage
Fielddata for text fields is very memory-intensive.
Diagnosis:
GET /_cat/fielddata?v
Solution:
- Avoid
fielddata: trueon text fields - Use
keywordfields for sorting and aggregations - Clear fielddata cache if needed:
POST /_cache/clear?fielddata=true
4. Large Bulk Requests
Oversized bulk requests create temporary memory pressure.
Solution:
- Keep bulk requests between 5-15 MB
- Reduce concurrent bulk indexing clients
5. Expensive Queries
Queries with large result sets or complex operations spike memory usage.
Solution:
- Limit
sizeparameter in searches - Use
search_afterfor pagination - Set query timeouts
Immediate Actions for High Heap Pressure
Step 1: Identify the Cause
Check what's consuming memory:
GET /_nodes/stats/indices/fielddata?fields=*
GET /_nodes/stats/indices/query_cache
GET /_nodes/stats/indices/request_cache
Step 2: Clear Caches (Temporary Relief)
POST /_cache/clear
For specific caches:
POST /_cache/clear?fielddata=true
POST /_cache/clear?query=true
POST /_cache/clear?request=true
Step 3: Cancel Resource-Intensive Tasks
Identify and cancel problematic operations:
GET /_tasks?detailed=true&group_by=parents
POST /_tasks/{task_id}/_cancel
Step 4: Reduce Load
- Temporarily reduce indexing rate
- Add query throttling
- Redirect traffic from affected nodes
Long-Term Solutions
Optimize Shard Configuration
Reduce total shard count by:
- Increasing shard size (target 10-50 GB)
- Using ILM to roll over and delete old indices
- Merging small indices
Scale the Cluster
- Vertical scaling: Increase heap size (up to 32 GB max)
- Horizontal scaling: Add more data nodes
Heap Sizing Best Practices
Important: Heap should be about half of RAM but never above 32 GB.
# In jvm.options.d/custom.options
-Xms16g
-Xmx16g
Monitor and Alert
Set up alerts for:
- Heap usage > 75% (warning)
- Heap usage > 85% (critical)
- GC time > 5% of total time
Monitoring Garbage Collection
High heap pressure causes frequent and long GC pauses:
GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc
Watch for:
collection_count- increasing rapidly indicates pressurecollection_time_in_millis- long GC times impact performance