Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch OutOfMemoryError Troubleshooting

OutOfMemoryError (OOM) in Elasticsearch causes node crashes and cluster instability. This guide covers the different types of OOM errors, their causes, and how to resolve them.

Types of OutOfMemoryError

1. Java Heap Space

java.lang.OutOfMemoryError: Java heap space

The JVM cannot allocate an object because the heap is full and garbage collection cannot free enough memory.

2. GC Overhead Limit Exceeded

java.lang.OutOfMemoryError: GC overhead limit exceeded

The JVM is spending more than 98% of time doing garbage collection while recovering less than 2% of memory.

3. Unable to Create Native Thread

java.lang.OutOfMemoryError: unable to create native thread

The system cannot create more threads, often due to OS limits or memory exhaustion outside the heap.

4. Direct Buffer Memory

java.lang.OutOfMemoryError: Direct buffer memory

Off-heap direct memory allocation failed.

Immediate Actions When OOM Occurs

Step 1: Check Which Nodes Are Affected

GET /_cat/nodes?v&h=name,heap.percent,heap.current,heap.max

Step 2: Review Elasticsearch Logs

grep -i "OutOfMemoryError\|heap space\|GC overhead" /var/log/elasticsearch/*.log

Step 3: Reduce Cluster Load

// Disable allocation to prevent shard movements
PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "none"
  }
}

Step 4: Restart Affected Nodes

After addressing the cause, restart nodes one at a time.

Troubleshooting Java Heap Space OOM

Diagnose the Cause

  1. Check heap configuration:
GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.mem
  1. Analyze what's consuming memory:
GET /_nodes/stats/indices?filter_path=nodes.*.indices.fielddata,nodes.*.indices.query_cache,nodes.*.indices.request_cache,nodes.*.indices.segments
  1. Review active tasks:
GET /_tasks?detailed=true

Common Causes and Fixes

Cause: Heap size too small

# jvm.options.d/heap.options
# Heap should be about half of RAM but never above 32 GB
-Xms16g
-Xmx16g

Cause: Too many shards

GET /_cluster/stats?filter_path=indices.shards.total

Consolidate shards - aim for 10-50 GB per shard.

Cause: Fielddata on text fields

GET /_cat/fielddata?v&fields=*
// Disable fielddata on text fields
PUT /my-index/_mapping
{
  "properties": {
    "my_field": {
      "type": "text",
      "fielddata": false
    }
  }
}

Cause: Large aggregations

Reduce aggregation sizes:

{
  "aggs": {
    "my_terms": {
      "terms": {
        "field": "category",
        "size": 100  // Instead of 10000
      }
    }
  }
}

Troubleshooting GC Overhead OOM

Diagnose

GET /_nodes/stats/jvm?filter_path=nodes.*.jvm.gc

Look for:

  • Very high collection_count
  • Long collection_time_in_millis

Fixes

  1. Increase heap (if under 31 GB)

  2. Reduce memory pressure:

    • Clear caches: POST /_cache/clear
    • Cancel expensive operations
    • Reduce concurrent operations
  3. Tune GC:

# jvm.options.d/gc.options
-XX:+UseG1GC
-XX:InitiatingHeapOccupancyPercent=30
-XX:G1ReservePercent=25

Troubleshooting Native Thread OOM

Diagnose

# Check current thread count
cat /proc/<es_pid>/status | grep Threads

# Check system limits
ulimit -u
cat /proc/sys/kernel/threads-max

Fixes

  1. Increase system limits:
# /etc/security/limits.conf
elasticsearch  -  nproc  4096
  1. Reduce Elasticsearch thread pools:
# elasticsearch.yml
thread_pool.search.size: 13
thread_pool.write.size: 5
  1. Add more nodes to distribute workload

Troubleshooting Direct Buffer OOM

Diagnose

Direct memory is used for network I/O and some internal operations.

# Check direct memory settings
grep "MaxDirectMemorySize" /etc/elasticsearch/jvm.options.d/*

Fixes

  1. Set direct memory explicitly:
# jvm.options.d/memory.options
-XX:MaxDirectMemorySize=256m
  1. Ensure heap + direct memory doesn't exceed available RAM

Preventing Future OOM Errors

Configure Circuit Breakers

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "70%",
    "indices.breaker.request.limit": "40%",
    "indices.breaker.fielddata.limit": "40%"
  }
}

Enable Heap Dumps

# jvm.options.d/heap_dump.options
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/elasticsearch/
-XX:+ExitOnOutOfMemoryError

Note: ExitOnOutOfMemoryError causes the JVM to exit on OOM, which is often better than running in a degraded state.

Set Up Monitoring

Monitor and alert on:

  • Heap usage > 85%
  • GC time > 10% of total time
  • Circuit breaker trips

Memory Lock

Prevent the OS from swapping Elasticsearch memory:

# elasticsearch.yml
bootstrap.memory_lock: true
# /etc/security/limits.conf
elasticsearch  -  memlock  unlimited

Analyzing Heap Dumps

Generate Heap Dump

jmap -dump:format=b,file=/tmp/heap.hprof <elasticsearch_pid>

Analyze with Eclipse MAT

  1. Download Eclipse Memory Analyzer
  2. Open the heap dump
  3. Run "Leak Suspects Report"
  4. Look at "Dominator Tree" for largest objects

Common Findings

Finding Likely Cause
Large char[] arrays Fielddata or large text fields
Many Segment objects Too many shards
Query cache objects Complex queries being cached
Aggregation buckets Large aggregations

Recovery Checklist

After experiencing OOM:

  • Node restarted successfully
  • Root cause identified
  • Configuration updated to prevent recurrence
  • Circuit breakers adjusted
  • Monitoring alerts set up
  • Cluster allocation re-enabled
  • Cluster health returned to green
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.