The Elasticsearch terms aggregation is a multi-bucket aggregation that groups documents by the unique values of a keyword, numeric, or ip field and returns the top N buckets ordered by document count. Each bucket holds the term, its doc_count, and any sub-aggregations you nest inside. It is the standard way to answer "group by" questions like top countries, top hosts, or top error codes.
Syntax
GET /my-index/_search
{
"size": 0,
"aggs": {
"popular_colors": {
"terms": {
"field": "color.keyword",
"size": 10,
"shard_size": 25,
"min_doc_count": 1,
"order": { "_count": "desc" }
}
}
}
}
The aggregation runs against keyword, numeric, ip, or boolean fields. To use it on a text field you must enable fielddata, which is rarely worth the heap cost - keep a .keyword sub-field instead.
Parameters
| Parameter | Default | Description |
|---|---|---|
field |
required | Field to bucket on. Must be aggregatable (keyword, numeric, ip, boolean). |
size |
10 | Number of buckets to return from the coordinating node. |
shard_size |
size * 1.5 + 10 |
Number of buckets each shard returns to the coordinator. Larger = more accurate, more memory. |
min_doc_count |
1 | Buckets with fewer matching docs are dropped. Set to 0 to include empty buckets when include is used. |
shard_min_doc_count |
0 | Per-shard threshold applied before results are sent to the coordinator. |
missing |
- | Bucket value used for documents missing the field. |
include / exclude |
- | Regex or exact-value list to filter terms. |
order |
{ "_count": "desc" } |
Sort by _count, _key, or a sub-aggregation. |
execution_hint |
global_ordinals |
global_ordinals (keyword default) or map. map only wins for very low cardinality. |
Examples
Top 5 colors, with average price per color:
GET /products/_search
{
"size": 0,
"aggs": {
"top_colors": {
"terms": { "field": "color.keyword", "size": 5 },
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
}
}
}
Include a "missing" bucket so documents with no region field are counted:
"terms": { "field": "region.keyword", "missing": "N/A" }
Order buckets by a sub-aggregation (highest total revenue first):
"terms": {
"field": "product_id",
"order": { "revenue": "desc" }
},
"aggs": { "revenue": { "sum": { "field": "amount" } } }
Performance and Accuracy Notes
The terms aggregation is approximate when size is smaller than the total cardinality. Each shard returns its local top shard_size terms, and the coordinator merges them. A term ranked 11th globally can be missed if no single shard had it in its top shard_size. The response includes doc_count_error_upper_bound and sum_other_doc_count so you can quantify the error - if either is non-zero and you need exact counts, raise shard_size or use the composite aggregation to paginate through every bucket.
Memory cost scales with cardinality of the field, not just size. The default global_ordinals execution loads field ordinals across all shard segments, which is fast for repeated queries but holds memory. High-cardinality terms aggregations (millions of unique values) are a common cause of circuit breaker trips and heap pressure. The manual loop - reading circuit breaker stats, matching them to aggregation queries in the slow log, deciding which fields need eager_global_ordinals or a composite aggregation rewrite - is exactly what Pulse automates.
Common Mistakes
- Running terms on a
textfield without a.keywordsub-field, triggering fielddata heap usage. - Trusting top-N counts when
doc_count_error_upper_bound > 0- the result is approximate. - Setting
sizeto a very large value (e.g. 100000) instead of using composite aggregation for full enumeration. - Ordering by a sub-aggregation while keeping
sizesmall - the same accuracy caveat applies, multiplied. - Forgetting
"size": 0on the outer search, which forces a full hits fetch you do not need.
Optimize High-Cardinality Terms Aggregations with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For terms aggregations specifically, Pulse:
- Identifies terms aggregations running on high-cardinality keyword fields (millions of unique values) that are loading global ordinals on every query and pushing nodes toward the request and fielddata circuit breakers
- Flags terms aggregations on analyzed
textfields that have triggeredfielddataheap loading, plus aggregations withdoc_count_error_upper_bound > 0where users are treating approximate top-N results as exact - Spots
sizevalues pushed up to enumerate everything (e.g. 100000) where a composite aggregation would page through the data with bounded memory - Traces each slow terms aggregation back to the calling service via slow-log and APM correlation
- Recommends concrete fixes: enable
eager_global_ordinalson the field, switch toexecution_hint: mapfor low-cardinality matches, move enumeration to a composite aggregation, raiseshard_sizeto shrink top-N error, or replace text fields with a.keywordsub-field - Tracks heap, circuit breaker, and latency impact after the change ships
This converts the manual heap and circuit-breaker triage loop into a continuous optimization workflow.
Frequently Asked Questions
Q: How do I get an exact count of all unique terms?
A: The terms aggregation cannot guarantee exact counts when total cardinality exceeds size. Use the composite aggregation to paginate through every bucket, or the cardinality aggregation for an approximate distinct count.
Q: What is the maximum size for a terms aggregation?
A: The hard limit is governed by search.max_buckets (default 65,536 across the whole response, cluster-level). The per-field guard index.max_terms_count is unrelated and applies to terms queries, not aggregations.
Q: Why is doc_count_error_upper_bound non-zero?
A: The terms aggregation merges per-shard top-N lists, so a term missing from one shard's top list contributes uncertainty. Raise shard_size to shrink the error or switch to a composite aggregation for an exact enumeration.
Q: How do I make a terms aggregation case-insensitive?
A: Index the field with a lowercase normalizer on the keyword mapping. Doing it at query time via scripts works but is slower and breaks global ordinals optimization.
Q: When should I use multi_terms instead of terms?
A: Use the multi-terms aggregation when you need unique combinations of two or more fields in a single bucket. Nesting two terms aggregations gives a hierarchy, not unique pairs.
Q: What does execution_hint: map change?
A: It switches from global ordinals to an in-memory hashmap keyed by the raw term values. It is faster only when the matched document set is very small and the field has high global cardinality.
Q: How do I monitor terms aggregations for heap pressure and circuit breaker trips?
A: Pulse tracks fielddata and request circuit breakers on Elasticsearch and OpenSearch nodes, attributes each trip to the aggregation pattern responsible, and recommends eager_global_ordinals, .keyword sub-field migration, or a composite aggregation rewrite when terms is being misused for full enumeration.
Related Reading
- Composite Aggregation: paginate through every bucket exactly.
- Multi Terms Aggregation: bucket on combinations of fields.
- Rare Terms Aggregation: the inverse - find infrequent values.
- Significant Terms Aggregation: terms over-represented in a subset.
- Cardinality Aggregation: count distinct values approximately.
- Elasticsearch Query Language: the query DSL terms aggregations run inside.