CircuitBreakingException: [fielddata] Data too large, data for [<field>] would be [X/...gb], which is larger than the limit of [Y/...gb] is logged when Elasticsearch refuses to load more text-field fielddata into the JVM heap because the field data circuit breaker would be exceeded. The request that triggered the load (usually a sort, aggregation, or scripted access on a text field) fails, but the cluster keeps running and serving other queries. The default limit is 40% of the JVM heap, controlled by indices.breaker.fielddata.limit.
What This Error Means
Field data is the in-memory uninverted form of a text field's terms, built lazily by Lucene on first access for sorting, aggregations, or scripting. The field data circuit breaker is one of four circuit breakers Elasticsearch uses to protect the JVM heap from runaway memory growth. Tripping it means a query asked Elasticsearch to load enough additional fielddata to push usage past indices.breaker.fielddata.limit (default 40% of heap).
Hitting this breaker is almost always a mapping problem: the field should be a keyword (which uses on-disk doc values, not heap fielddata), or aggregations should run against a different field. Raising the limit treats the symptom; the durable fix is to stop loading fielddata for that field at all.
Common Causes
- Aggregation or sort on a
textfield withfielddata: true. How to confirm:GET <index>/_mapping/field/<field>and look for"fielddata": trueon atexttype. - High-cardinality terms aggregation on an analyzed field. How to confirm: check the slow log for the failing query and count distinct values with
cardinalityagg on the matching.keywordsubfield. - Painless script reading a
textfield viadoc['field'].value. How to confirm: Painless will silently force-load fielddata; check script source fortextfield access and migrate toparams._sourceor the.keywordsubfield. - JVM heap too small for working set. How to confirm:
GET _nodes/stats/jvmand comparemem.heap_used_in_bytesagainst the configured heap. indices.breaker.fielddata.limitset lower than typical fielddata working set. How to confirm:GET _nodes/stats/breakerand comparelimit_size_in_bytesagainstestimated_size_in_bytes.
How to Fix Field Data Circuit Breaker Tripped
Inspect current breaker state:
GET /_nodes/stats/breakerThe
fielddatablock showsestimated_size_in_bytes,limit_size_in_bytes, andtripped.Identify which field is loaded:
GET /_cat/fielddata?v&fields=*Switch the field to
keywordwith doc values (the durable fix). Reindex into a new mapping:PUT /my-index-v2 { "mappings": { "properties": { "tag": { "type": "keyword" } } } }Then
POST _reindexfrom the old index and swap aliases.For existing
textfields that need both search and aggregation, use a multi-field instead offielddata: true:"properties": { "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } }Then aggregate on
title.keywordinstead oftitle.Clear cached fielddata as a short-term release valve:
POST /<index>/_cache/clear?fielddata=trueThis frees memory immediately but the next query that needs it will reload.
Raise the breaker only if heap is genuinely sized for it (use sparingly; default of 40% exists to prevent OOM):
PUT /_cluster/settings { "persistent": { "indices.breaker.fielddata.limit": "50%" } }
Resolve Field Data Circuit Breaker Trips Automatically with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. When CircuitBreakingException: [fielddata] Data too large fires, Pulse:
- Correlates the trip with
_nodes/stats/breaker(thefielddatablock'sestimated_size_in_bytesvslimit_size_in_bytes),_cat/fielddata?vper-field heap usage, the failing query from the slow log, and the current mapping for the offending field - Identifies which of the five causes applies - sort or aggregation against a
textfield withfielddata: true, high-cardinality terms agg on an analyzed field, Painless readingdoc['text_field'].value, undersized heap, orindices.breaker.fielddata.limitset below the working set - Generates the durable fix: the reindex into a
keywordmapping (or a multi-field withfields.keyword), the rewritten aggregation against.keyword, thePOST /<index>/_cache/clear?fielddata=truerelease-valve call, or theindices.breaker.fielddata.limitchange when heap is actually sized for it - Applies dynamic settings changes (breaker limit, cache clears) automatically with operator approval; leaves mapping migrations and reindex plans as a one-click PR
Pulse continuously tracks breaker.fielddata.tripped counter increases and alerts before user-visible failures, which is the difference between catching a high-cardinality terms agg in staging and finding it through customer complaints.
Start a free trial to connect your cluster.
Frequently Asked Questions
Q: What is the default fielddata circuit breaker limit in Elasticsearch?
A: The fielddata circuit breaker defaults to 40% of the JVM heap, controlled by indices.breaker.fielddata.limit. The parent breaker defaults to 95%, the request breaker to 60%, and the accounting breaker to 100%.
Q: Can I disable the field data circuit breaker?
A: Setting indices.breaker.fielddata.limit: 100% effectively disables the per-breaker check, but the parent breaker (95% of heap) will still trip and you risk full JVM OutOfMemoryError. Removing the protection without addressing the underlying fielddata usage is unsafe.
Q: How is fielddata different from doc values?
A: Doc values are an on-disk columnar format used by default for keyword, numeric, date, and geo_point fields, paged in via the OS page cache. Fielddata is an in-heap structure built at query time for text fields with fielddata: true. Doc values scale far better.
Q: Why does my aggregation work fine for small indices but trip the breaker on larger ones?
A: Fielddata size grows roughly linearly with the number of unique terms across the index. A small index has few unique tokens, so heap usage stays under the limit. As the index grows, total fielddata can exceed the 40% limit even though per-query memory looks reasonable.
Q: Will clearing the fielddata cache cause data loss?
A: No. POST /<index>/_cache/clear?fielddata=true only drops the in-memory uninverted index; the underlying Lucene segments are untouched. The next sort or aggregation on the field rebuilds it.
Q: Should I increase heap size to fix this?
A: Sometimes, but heap larger than ~26-32 GB loses compressed object pointers and gets worse, not better. Switch to keyword/doc values first; resize heap only if measurements show the working set genuinely needs more.
Q: What's the fastest way to diagnose a fielddata circuit breaker trip in production?
A: Pulse, the AI DBA for Elasticsearch and OpenSearch, ties the CircuitBreakingException: [fielddata] line to the offending query, the field consuming heap, and the current mapping in a single view, then proposes either the .keyword rewrite, the multi-field mapping change, or a measured indices.breaker.fielddata.limit increase. It applies the dynamic setting change with approval and leaves the reindex plan as a PR.
Related Reading
- Elasticsearch CircuitBreakingException: Data too large: parent and request breakers.
- What is the Elasticsearch field data cache: how fielddata is built and stored.
- Elasticsearch circuit breaker exceptions fix: cross-breaker remediation patterns.
- Elasticsearch heap size setting: JVM sizing guidance.
- Elasticsearch monitoring: proactive detection of circuit breaker events.