Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Circuit Breaker Exceptions Fix

Circuit breakers in Elasticsearch prevent operations from consuming too much memory, which would cause OutOfMemoryError and node crashes. When a circuit breaker trips, it returns an error rather than allowing the operation to destabilize the cluster.

Understanding Circuit Breakers

Types of Circuit Breakers

Circuit Breaker Default Limit Purpose
Parent 95% of heap Total memory for all breakers
Field data 40% of heap Text field aggregations/sorting
Request 60% of heap Per-request data structures
In-flight requests 100% of heap Transport-level requests
Accounting 100% of heap Finished request memory

Circuit Breaker Error Message

CircuitBreakingException: [parent] Data too large, data for [<operation>]
would be [X/Xgb], which is larger than the limit of [Y/Ygb],
real usage: [Z/Zgb], new bytes reserved: [W/Wgb]

Diagnosing Circuit Breaker Issues

Check Current Breaker Status

GET /_nodes/stats/breaker

Key fields:

  • limit_size: Maximum allowed
  • estimated_size: Currently estimated usage
  • tripped: Number of times breaker has tripped

Identify Which Breaker Is Tripping

GET /_cat/nodes?v&h=name,fielddata.memory_size,query_cache.memory_size,request_cache.memory_size

Review Recent Operations

GET /_tasks?detailed=true

Common Causes and Fixes

Cause 1: Field Data Breaker Trips

Symptom: Error mentions fielddata

Cause: Aggregating or sorting on text fields, which loads field data into memory.

Diagnosis:

GET /_cat/fielddata?v&fields=*

Fixes:

  1. Clear fielddata cache:
POST /_cache/clear?fielddata=true
  1. Use keyword fields instead of text:
PUT /my-index/_mapping
{
  "properties": {
    "category": {
      "type": "keyword"  // Not text
    }
  }
}
  1. Disable fielddata on text fields:
PUT /my-index/_mapping
{
  "properties": {
    "description": {
      "type": "text",
      "fielddata": false
    }
  }
}

Cause 2: Request Breaker Trips

Symptom: Error mentions request

Cause: Large aggregations, big result sets, or complex queries.

Fixes:

  1. Reduce aggregation bucket sizes:
{
  "aggs": {
    "my_terms": {
      "terms": {
        "field": "category",
        "size": 100  // Reduce from 10000
      }
    }
  }
}
  1. Use composite aggregation for high-cardinality fields:
{
  "aggs": {
    "my_composite": {
      "composite": {
        "size": 100,
        "sources": [
          {"category": {"terms": {"field": "category"}}}
        ]
      }
    }
  }
}
  1. Limit result size:
{
  "size": 100,
  "query": { ... }
}

Cause 3: Parent Breaker Trips

Symptom: Error mentions parent

Cause: Combined memory usage across all operations exceeds limit.

Fixes:

  1. Reduce concurrent operations
  2. Scale the cluster (add nodes)
  3. Increase heap size (up to 32 GB max)

Important: Heap should be about half of RAM but never above 32 GB.

Cause 4: In-Flight Requests Breaker Trips

Symptom: Error mentions in_flight_requests

Cause: Too many concurrent requests or very large request payloads.

Fixes:

  1. Reduce bulk request sizes (5-15 MB optimal)
  2. Reduce concurrent clients
  3. Implement request queuing on client side

Adjusting Circuit Breaker Limits

When to Adjust

Only adjust limits if:

  • You understand the root cause
  • You have headroom in available memory
  • The default limits are too conservative for your workload

Configuration

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "70%",
    "indices.breaker.fielddata.limit": "40%",
    "indices.breaker.request.limit": "60%"
  }
}

Conservative Settings for Stability

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "65%",
    "indices.breaker.fielddata.limit": "30%",
    "indices.breaker.request.limit": "50%",
    "indices.breaker.total.use_real_memory": true
  }
}

Aggressive Settings (Use with Caution)

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.limit": "85%",
    "indices.breaker.fielddata.limit": "50%",
    "indices.breaker.request.limit": "70%"
  }
}

Warning: Aggressive settings increase OOM risk.

Real Memory vs Estimated Memory

Enable Real Memory Tracking

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.total.use_real_memory": true
  }
}

This uses actual JVM memory usage instead of estimates, providing more accurate protection.

Prevention Strategies

1. Query Governance

Implement query validation:

  • Reject queries without size limits
  • Limit aggregation bucket sizes
  • Set default timeouts

2. Index Design

  • Use keyword type for fields that need aggregation
  • Avoid enabling fielddata on text fields
  • Use doc_values: false only when you don't need sorting/aggregations

3. Monitoring

Set up alerts for:

  • Breaker tripped count increasing
  • Heap usage > 75%
  • Fielddata cache size growing

4. Client-Side Handling

# Python example: Handle circuit breaker errors
from elasticsearch import Elasticsearch, TransportError

es = Elasticsearch()

try:
    result = es.search(index="my-index", body=query)
except TransportError as e:
    if "circuit_breaking_exception" in str(e):
        # Reduce query complexity and retry
        simplified_query = simplify_query(query)
        result = es.search(index="my-index", body=simplified_query)

Troubleshooting Workflow

Circuit Breaker Exception
         │
         ▼
    Check which breaker tripped
    GET /_nodes/stats/breaker
         │
         ├── fielddata ──► Check aggregations/sorting on text fields
         │                 Fix: Use keyword fields, clear cache
         │
         ├── request ────► Check query complexity
         │                 Fix: Reduce size, simplify aggregations
         │
         └── parent ─────► Overall memory pressure
                           Fix: Scale cluster, increase heap
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.