When Elasticsearch JVM heap pressure reaches 75% or 85%, it indicates memory stress that requires attention. This guide provides specific fixes for each threshold level to restore healthy heap usage.
Understanding the Thresholds
75% Heap Usage - Warning Zone
At 75% heap usage, you're entering the warning zone:
- Garbage collection becomes more frequent
- Performance may start to degrade
- Risk of reaching critical levels during traffic spikes
85% Heap Usage - Critical Zone
At 85% heap usage, immediate action is required:
- GC pauses become longer and more frequent
- Circuit breakers may start triggering
- Risk of OutOfMemoryError increases significantly
Diagnosing Current State
Check Heap Usage Per Node
GET /_cat/nodes?v&h=name,heap.percent,heap.current,heap.max,ram.percent
Review GC Activity
GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc
High collection_count and collection_time_in_millis indicate GC pressure.
Identify Memory Consumers
GET /_nodes/stats/indices?filter_path=nodes.*.indices.fielddata,nodes.*.indices.query_cache,nodes.*.indices.request_cache,nodes.*.indices.segments
Fixes for 75% Heap Pressure
Fix 1: Clear Caches
Clear unnecessary cached data:
POST /_cache/clear?request=true
POST /_cache/clear?query=true
Note: Avoid clearing fielddata cache unless you know it's the issue, as rebuilding it is expensive.
Fix 2: Reduce Concurrent Operations
Limit concurrent search and indexing operations:
PUT /_cluster/settings
{
"transient": {
"search.max_concurrent_shard_requests": 3
}
}
Fix 3: Optimize Aggregations
If aggregations are consuming memory, reduce bucket sizes:
// Before (high memory)
{
"aggs": {
"terms_agg": {
"terms": {
"field": "category.keyword",
"size": 10000
}
}
}
}
// After (optimized)
{
"aggs": {
"terms_agg": {
"terms": {
"field": "category.keyword",
"size": 100
}
}
}
}
Fix 4: Review and Reduce Shards
Check total shard count:
GET /_cat/shards?v | wc -l
GET /_cluster/stats?filter_path=indices.shards.total
If excessive, plan shard consolidation using ILM or index shrinking.
Fixes for 85% Heap Pressure
Immediate Fix 1: Clear All Caches
When at 85%, clear all caches:
POST /_cache/clear
Immediate Fix 2: Cancel Resource-Intensive Tasks
Identify and cancel expensive operations:
GET /_tasks?detailed=true&group_by=parents
// Cancel specific task
POST /_tasks/{task_id}/_cancel
Immediate Fix 3: Reduce Load
Temporarily reduce incoming traffic:
// Reduce indexing throughput
PUT /_cluster/settings
{
"transient": {
"indices.memory.index_buffer_size": "5%"
}
}
Immediate Fix 4: Enable More Aggressive GC
If using G1GC (default in modern versions), you can tune:
# In jvm.options.d/gc.options
-XX:G1HeapWastePercent=10
-XX:G1ReservePercent=25
Long-Term Solutions
Solution 1: Increase Heap Size
If your server has available RAM:
# In jvm.options.d/heap.options
# Heap should be about half of RAM but never above 32 GB
-Xms16g
-Xmx16g
Important: Heap should be about half of RAM but never above 32 GB.
Solution 2: Add More Nodes
Distribute the load across more nodes:
# Scale horizontally by adding data nodes
# Each new node should have:
# - Dedicated heap allocation
# - Portion of shards redistributed to it
Solution 3: Reduce Shard Count
Consolidate shards:
// Use ILM to roll over to fewer shards
PUT _ilm/policy/reduce_shards
{
"policy": {
"phases": {
"warm": {
"actions": {
"shrink": {
"number_of_shards": 1
}
}
}
}
}
}
Solution 4: Optimize Field Data Usage
Disable fielddata on text fields:
PUT /my-index/_mapping
{
"properties": {
"title": {
"type": "text",
"fielddata": false
}
}
}
Use keyword fields for sorting and aggregations.
Solution 5: Implement Circuit Breaker Tuning
Protect from memory spikes:
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.total.limit": "70%",
"indices.breaker.fielddata.limit": "40%",
"indices.breaker.request.limit": "40%"
}
}
Monitoring After Fixes
Set Up Heap Monitoring
Watch heap recovery:
GET /_cat/nodes?v&h=name,heap.percent,heap.current&s=heap.percent:desc
Verify GC Improvement
Monitor GC frequency:
GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc.collectors.*.collection_count,nodes.*.jvm.gc.collectors.*.collection_time_in_millis
Create Alerts
Configure alerts at:
- 70%: Early warning
- 75%: Warning - start investigation
- 85%: Critical - immediate action required
Prevention Checklist
- Heap size set to 50% of RAM (max 32 GB)
- Shards sized between 10-50 GB
- No text fields with
fielddata: true - Aggregation sizes limited
- ILM configured for old data management
- Monitoring and alerting in place
- Regular review of slow queries