Circuit breakers in Elasticsearch prevent operations from consuming too much memory, which would cause OutOfMemoryError and node crashes. When a circuit breaker trips, it returns an error rather than allowing the operation to destabilize the cluster.
Understanding Circuit Breakers
Types of Circuit Breakers
| Circuit Breaker | Default Limit | Purpose |
|---|---|---|
| Parent | 95% of heap | Total memory for all breakers |
| Field data | 40% of heap | Text field aggregations/sorting |
| Request | 60% of heap | Per-request data structures |
| In-flight requests | 100% of heap | Transport-level requests |
| Accounting | 100% of heap | Finished request memory |
Circuit Breaker Error Message
CircuitBreakingException: [parent] Data too large, data for [<operation>]
would be [X/Xgb], which is larger than the limit of [Y/Ygb],
real usage: [Z/Zgb], new bytes reserved: [W/Wgb]
Diagnosing Circuit Breaker Issues
Check Current Breaker Status
GET /_nodes/stats/breaker
Key fields:
limit_size: Maximum allowedestimated_size: Currently estimated usagetripped: Number of times breaker has tripped
Identify Which Breaker Is Tripping
GET /_cat/nodes?v&h=name,fielddata.memory_size,query_cache.memory_size,request_cache.memory_size
Review Recent Operations
GET /_tasks?detailed=true
Common Causes and Fixes
Cause 1: Field Data Breaker Trips
Symptom: Error mentions fielddata
Cause: Aggregating or sorting on text fields, which loads field data into memory.
Diagnosis:
GET /_cat/fielddata?v&fields=*
Fixes:
- Clear fielddata cache:
POST /_cache/clear?fielddata=true
- Use keyword fields instead of text:
PUT /my-index/_mapping
{
"properties": {
"category": {
"type": "keyword" // Not text
}
}
}
- Disable fielddata on text fields:
PUT /my-index/_mapping
{
"properties": {
"description": {
"type": "text",
"fielddata": false
}
}
}
Cause 2: Request Breaker Trips
Symptom: Error mentions request
Cause: Large aggregations, big result sets, or complex queries.
Fixes:
- Reduce aggregation bucket sizes:
{
"aggs": {
"my_terms": {
"terms": {
"field": "category",
"size": 100 // Reduce from 10000
}
}
}
}
- Use composite aggregation for high-cardinality fields:
{
"aggs": {
"my_composite": {
"composite": {
"size": 100,
"sources": [
{"category": {"terms": {"field": "category"}}}
]
}
}
}
}
- Limit result size:
{
"size": 100,
"query": { ... }
}
Cause 3: Parent Breaker Trips
Symptom: Error mentions parent
Cause: Combined memory usage across all operations exceeds limit.
Fixes:
- Reduce concurrent operations
- Scale the cluster (add nodes)
- Increase heap size (up to 32 GB max)
Important: Heap should be about half of RAM but never above 32 GB.
Cause 4: In-Flight Requests Breaker Trips
Symptom: Error mentions in_flight_requests
Cause: Too many concurrent requests or very large request payloads.
Fixes:
- Reduce bulk request sizes (5-15 MB optimal)
- Reduce concurrent clients
- Implement request queuing on client side
Adjusting Circuit Breaker Limits
When to Adjust
Only adjust limits if:
- You understand the root cause
- You have headroom in available memory
- The default limits are too conservative for your workload
Configuration
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.total.limit": "70%",
"indices.breaker.fielddata.limit": "40%",
"indices.breaker.request.limit": "60%"
}
}
Conservative Settings for Stability
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.total.limit": "65%",
"indices.breaker.fielddata.limit": "30%",
"indices.breaker.request.limit": "50%",
"indices.breaker.total.use_real_memory": true
}
}
Aggressive Settings (Use with Caution)
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.total.limit": "85%",
"indices.breaker.fielddata.limit": "50%",
"indices.breaker.request.limit": "70%"
}
}
Warning: Aggressive settings increase OOM risk.
Real Memory vs Estimated Memory
Enable Real Memory Tracking
PUT /_cluster/settings
{
"persistent": {
"indices.breaker.total.use_real_memory": true
}
}
This uses actual JVM memory usage instead of estimates, providing more accurate protection.
Prevention Strategies
1. Query Governance
Implement query validation:
- Reject queries without size limits
- Limit aggregation bucket sizes
- Set default timeouts
2. Index Design
- Use
keywordtype for fields that need aggregation - Avoid enabling
fielddataon text fields - Use
doc_values: falseonly when you don't need sorting/aggregations
3. Monitoring
Set up alerts for:
- Breaker
trippedcount increasing - Heap usage > 75%
- Fielddata cache size growing
4. Client-Side Handling
# Python example: Handle circuit breaker errors
from elasticsearch import Elasticsearch, TransportError
es = Elasticsearch()
try:
result = es.search(index="my-index", body=query)
except TransportError as e:
if "circuit_breaking_exception" in str(e):
# Reduce query complexity and retry
simplified_query = simplify_query(query)
result = es.search(index="my-index", body=simplified_query)
Troubleshooting Workflow
Circuit Breaker Exception
│
▼
Check which breaker tripped
GET /_nodes/stats/breaker
│
├── fielddata ──► Check aggregations/sorting on text fields
│ Fix: Use keyword fields, clear cache
│
├── request ────► Check query complexity
│ Fix: Reduce size, simplify aggregations
│
└── parent ─────► Overall memory pressure
Fix: Scale cluster, increase heap