Elasticsearch JVM Heap Pressure 75% 85% Fix

When Elasticsearch JVM heap pressure reaches 75% or 85%, it indicates memory stress that requires attention. This guide provides specific fixes for each threshold level to restore healthy heap usage.

Understanding the Thresholds

75% Heap Usage - Warning Zone

At 75% heap usage, you're entering the warning zone:

Garbage collection becomes more frequent
Performance may start to degrade
Risk of reaching critical levels during traffic spikes

85% Heap Usage - Critical Zone

At 85% heap usage, immediate action is required:

GC pauses become longer and more frequent
Circuit breakers may start triggering
Risk of OutOfMemoryError increases significantly

Diagnosing Current State

Check Heap Usage Per Node

GET /_cat/nodes?v&h=name,heap.percent,heap.current,heap.max,ram.percent

Review GC Activity

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc

High collection_count and collection_time_in_millis indicate GC pressure.

Identify Memory Consumers

GET /_nodes/stats/indices?filter_path=nodes.*.indices.fielddata,nodes.*.indices.query_cache,nodes.*.indices.request_cache,nodes.*.indices.segments

Fixes for 75% Heap Pressure

Fix 1: Clear Caches

Clear unnecessary cached data:

POST /_cache/clear?request=true
POST /_cache/clear?query=true

Note: Avoid clearing fielddata cache unless you know it's the issue, as rebuilding it is expensive.

Fix 2: Reduce Concurrent Operations

Limit concurrent search and indexing operations:

PUT /_cluster/settings
{
  "transient": {
    "search.max_concurrent_shard_requests": 3
  }
}

Fix 3: Optimize Aggregations

If aggregations are consuming memory, reduce bucket sizes:

// Before (high memory)
{
  "aggs": {
    "terms_agg": {
      "terms": {
        "field": "category.keyword",
        "size": 10000
      }
    }
  }
}

// After (optimized)
{
  "aggs": {
    "terms_agg": {
      "terms": {
        "field": "category.keyword",
        "size": 100
      }
    }
  }
}

Fix 4: Review and Reduce Shards

Check total shard count:

GET /_cat/shards?v | wc -l
GET /_cluster/stats?filter_path=indices.shards.total

If excessive, plan shard consolidation using ILM or index shrinking.

Fixes for 85% Heap Pressure

Immediate Fix 1: Clear All Caches

When at 85%, clear all caches:

POST /_cache/clear

Immediate Fix 2: Cancel Resource-Intensive Tasks

Identify and cancel expensive operations:

GET /_tasks?detailed=true&group_by=parents

// Cancel specific task
POST /_tasks/{task_id}/_cancel

Immediate Fix 3: Reduce Load

Temporarily reduce incoming traffic:

// Reduce indexing throughput
PUT /_cluster/settings
{
  "transient": {
    "indices.memory.index_buffer_size": "5%"
  }
}

Immediate Fix 4: Enable More Aggressive GC

If using G1GC (default in modern versions), you can tune:

# In jvm.options.d/gc.options
-XX:G1HeapWastePercent=10
-XX:G1ReservePercent=25

Long-Term Solutions

Solution 1: Increase Heap Size

If your server has available RAM:

# In jvm.options.d/heap.options
# Heap should be about half of RAM but never above 32 GB
-Xms16g
-Xmx16g

Important: Heap should be about half of RAM but never above 32 GB.

Solution 2: Add More Nodes

Distribute the load across more nodes:

# Scale horizontally by adding data nodes
# Each new node should have:
# - Dedicated heap allocation
# - Portion of shards redistributed to it

Solution 3: Reduce Shard Count

Consolidate shards:

// Use ILM to roll over to fewer shards
PUT _ilm/policy/reduce_shards
{
  "policy": {
    "phases": {
      "warm": {
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      }
    }
  }
}

Solution 4: Optimize Field Data Usage

Disable fielddata on text fields:

PUT /my-index/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "fielddata": false
    }
  }
}

Use keyword fields for sorting and aggregations.

Solution 5: Implement Circuit Breaker Tuning

Protect from memory spikes:

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "70%",
    "indices.breaker.fielddata.limit": "40%",
    "indices.breaker.request.limit": "40%"
  }
}

Monitoring After Fixes

Set Up Heap Monitoring

Watch heap recovery:

GET /_cat/nodes?v&h=name,heap.percent,heap.current&s=heap.percent:desc

Verify GC Improvement

Monitor GC frequency:

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc.collectors.*.collection_count,nodes.*.jvm.gc.collectors.*.collection_time_in_millis

Create Alerts

Configure alerts at:

70%: Early warning
75%: Warning - start investigation
85%: Critical - immediate action required

Prevention Checklist

Heap size set to 50% of RAM (max 32 GB)
Shards sized between 10-50 GB
No text fields with fielddata: true
Aggregation sizes limited
ILM configured for old data management
Monitoring and alerting in place
Regular review of slow queries