NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Terms Aggregation - Syntax, Parameters, and Examples - Syntax, Example, and Tips

The Elasticsearch terms aggregation is a multi-bucket aggregation that groups documents by the unique values of a keyword, numeric, or ip field and returns the top N buckets ordered by document count. Each bucket holds the term, its doc_count, and any sub-aggregations you nest inside. It is the standard way to answer "group by" questions like top countries, top hosts, or top error codes.

Syntax

GET /my-index/_search
{
  "size": 0,
  "aggs": {
    "popular_colors": {
      "terms": {
        "field": "color.keyword",
        "size": 10,
        "shard_size": 25,
        "min_doc_count": 1,
        "order": { "_count": "desc" }
      }
    }
  }
}

The aggregation runs against keyword, numeric, ip, or boolean fields. To use it on a text field you must enable fielddata, which is rarely worth the heap cost - keep a .keyword sub-field instead.

Parameters

Parameter Default Description
field required Field to bucket on. Must be aggregatable (keyword, numeric, ip, boolean).
size 10 Number of buckets to return from the coordinating node.
shard_size size * 1.5 + 10 Number of buckets each shard returns to the coordinator. Larger = more accurate, more memory.
min_doc_count 1 Buckets with fewer matching docs are dropped. Set to 0 to include empty buckets when include is used.
shard_min_doc_count 0 Per-shard threshold applied before results are sent to the coordinator.
missing - Bucket value used for documents missing the field.
include / exclude - Regex or exact-value list to filter terms.
order { "_count": "desc" } Sort by _count, _key, or a sub-aggregation.
execution_hint global_ordinals global_ordinals (keyword default) or map. map only wins for very low cardinality.

Examples

Top 5 colors, with average price per color:

GET /products/_search
{
  "size": 0,
  "aggs": {
    "top_colors": {
      "terms": { "field": "color.keyword", "size": 5 },
      "aggs": {
        "avg_price": { "avg": { "field": "price" } }
      }
    }
  }
}

Include a "missing" bucket so documents with no region field are counted:

"terms": { "field": "region.keyword", "missing": "N/A" }

Order buckets by a sub-aggregation (highest total revenue first):

"terms": {
  "field": "product_id",
  "order": { "revenue": "desc" }
},
"aggs": { "revenue": { "sum": { "field": "amount" } } }

Performance and Accuracy Notes

The terms aggregation is approximate when size is smaller than the total cardinality. Each shard returns its local top shard_size terms, and the coordinator merges them. A term ranked 11th globally can be missed if no single shard had it in its top shard_size. The response includes doc_count_error_upper_bound and sum_other_doc_count so you can quantify the error - if either is non-zero and you need exact counts, raise shard_size or use the composite aggregation to paginate through every bucket.

Memory cost scales with cardinality of the field, not just size. The default global_ordinals execution loads field ordinals across all shard segments, which is fast for repeated queries but holds memory. High-cardinality terms aggregations (millions of unique values) are a common cause of circuit breaker trips and heap pressure. The manual loop - reading circuit breaker stats, matching them to aggregation queries in the slow log, deciding which fields need eager_global_ordinals or a composite aggregation rewrite - is exactly what Pulse automates.

Common Mistakes

  1. Running terms on a text field without a .keyword sub-field, triggering fielddata heap usage.
  2. Trusting top-N counts when doc_count_error_upper_bound > 0 - the result is approximate.
  3. Setting size to a very large value (e.g. 100000) instead of using composite aggregation for full enumeration.
  4. Ordering by a sub-aggregation while keeping size small - the same accuracy caveat applies, multiplied.
  5. Forgetting "size": 0 on the outer search, which forces a full hits fetch you do not need.

Optimize High-Cardinality Terms Aggregations with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For terms aggregations specifically, Pulse:

  • Identifies terms aggregations running on high-cardinality keyword fields (millions of unique values) that are loading global ordinals on every query and pushing nodes toward the request and fielddata circuit breakers
  • Flags terms aggregations on analyzed text fields that have triggered fielddata heap loading, plus aggregations with doc_count_error_upper_bound > 0 where users are treating approximate top-N results as exact
  • Spots size values pushed up to enumerate everything (e.g. 100000) where a composite aggregation would page through the data with bounded memory
  • Traces each slow terms aggregation back to the calling service via slow-log and APM correlation
  • Recommends concrete fixes: enable eager_global_ordinals on the field, switch to execution_hint: map for low-cardinality matches, move enumeration to a composite aggregation, raise shard_size to shrink top-N error, or replace text fields with a .keyword sub-field
  • Tracks heap, circuit breaker, and latency impact after the change ships

This converts the manual heap and circuit-breaker triage loop into a continuous optimization workflow.

Try Pulse on your cluster.

Frequently Asked Questions

Q: How do I get an exact count of all unique terms?
A: The terms aggregation cannot guarantee exact counts when total cardinality exceeds size. Use the composite aggregation to paginate through every bucket, or the cardinality aggregation for an approximate distinct count.

Q: What is the maximum size for a terms aggregation?
A: The hard limit is governed by search.max_buckets (default 65,536 across the whole response, cluster-level). The per-field guard index.max_terms_count is unrelated and applies to terms queries, not aggregations.

Q: Why is doc_count_error_upper_bound non-zero?
A: The terms aggregation merges per-shard top-N lists, so a term missing from one shard's top list contributes uncertainty. Raise shard_size to shrink the error or switch to a composite aggregation for an exact enumeration.

Q: How do I make a terms aggregation case-insensitive?
A: Index the field with a lowercase normalizer on the keyword mapping. Doing it at query time via scripts works but is slower and breaks global ordinals optimization.

Q: When should I use multi_terms instead of terms?
A: Use the multi-terms aggregation when you need unique combinations of two or more fields in a single bucket. Nesting two terms aggregations gives a hierarchy, not unique pairs.

Q: What does execution_hint: map change?
A: It switches from global ordinals to an in-memory hashmap keyed by the raw term values. It is faster only when the matched document set is very small and the field has high global cardinality.

Q: How do I monitor terms aggregations for heap pressure and circuit breaker trips?
A: Pulse tracks fielddata and request circuit breakers on Elasticsearch and OpenSearch nodes, attributes each trip to the aggregation pattern responsible, and recommends eager_global_ordinals, .keyword sub-field migration, or a composite aggregation rewrite when terms is being misused for full enumeration.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.