NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Multi Terms Aggregation - Bucketing by Combined Fields - Syntax, Example, and Tips

The Elasticsearch multi_terms aggregation is a multi-bucket aggregation that groups documents by the unique combinations of values from two or more fields. Each bucket key is the tuple of source values, and doc_count reflects how many documents share that exact combination. Use it when you need "group by (category, country)" semantics in one bucket pass instead of a nested terms aggregation, which produces a hierarchy rather than unique pairs.

Syntax

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "by_cat_country": {
      "multi_terms": {
        "terms": [
          { "field": "product.category.keyword" },
          { "field": "customer.country.keyword" }
        ],
        "size": 25,
        "shard_size": 100
      },
      "aggs": {
        "total_sales": { "sum": { "field": "amount" } }
      }
    }
  }
}

Bucket keys appear as a JSON array preserving the order of the terms entries.

Parameters

Parameter Default Description
terms required Ordered list of { "field": ... } or { "script": ... } source specs. Two or more entries.
size 10 Top-N buckets returned to the client.
shard_size size * 1.5 + 10 Buckets each shard returns to the coordinator. Larger = more accurate, more memory.
min_doc_count 1 Drop buckets below this document count.
shard_min_doc_count 0 Per-shard threshold applied before coordinator merge.
order { "_count": "desc" } Sort by _count, _key, or a sub-aggregation.
Per-term missing - Substitute used when one of the fields is absent.
collect_mode - breadth_first or depth_first, for nested sub-aggregations.

multi_terms inherits the same approximate top-N semantics as the terms aggregation - the response includes doc_count_error_upper_bound and sum_other_doc_count.

Examples

Top 20 (status, host) pairs by event count:

"multi_terms": {
  "terms": [
    { "field": "status.keyword" },
    { "field": "host.keyword" }
  ],
  "size": 20
}

Substitute missing values per source field:

"multi_terms": {
  "terms": [
    { "field": "region.keyword", "missing": "unknown" },
    { "field": "channel.keyword" }
  ]
}

Sort by a sub-aggregation:

"multi_terms": {
  "terms": [
    { "field": "department.keyword" },
    { "field": "team.keyword" }
  ],
  "order": { "spend": "desc" }
},
"aggs": {
  "spend": { "sum": { "field": "cost" } }
}

Performance and Cardinality Notes

multi_terms builds a hash map keyed by the tuple of source values. Memory cost scales roughly with the Cartesian product of the source fields' cardinalities, not the sum - two fields with 10000 unique values each can produce up to 100 million distinct combinations even if many do not co-occur in practice. Use it on fields with bounded combined cardinality, or scope the query first.

Performance is worse than a single terms aggregation but typically better than a nested terms aggregation when the goal is unique combinations rather than a hierarchical breakdown. For full enumeration of all combinations, the composite aggregation with multiple terms sources is the right tool - it paginates instead of holding everything in memory.

multi_terms on high-cardinality fields is a common cause of circuit breaker trips and slow searches. Manually tracking which queries are forming Cartesian-product hash maps that approach OOM is exactly the loop Pulse runs continuously.

Common Mistakes

  1. Confusing multi_terms with nested terms aggregations - nested terms produces a hierarchy, multi_terms produces unique pairs.
  2. Running multi_terms on near-unique fields, producing as many buckets as documents.
  3. Trusting top-N when doc_count_error_upper_bound > 0 is non-zero - the result is approximate just like terms.
  4. Forgetting that missing must be set per source field, not once for the whole aggregation.
  5. Using multi_terms when you need exhaustive enumeration. Switch to a composite aggregation for that.

Optimize multi_terms Aggregations for Memory with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For multi_terms aggregations specifically, Pulse:

  • Identifies multi_terms aggregations whose combined source cardinality approaches the Cartesian product of per-field cardinality, where the in-memory tuple hash map is pushing nodes toward request circuit breakers
  • Flags multi_terms running on near-unique fields (IDs, session UUIDs), where the bucket count approaches the document count and the response is meaningless
  • Spots scripted sources that disable global ordinals and force per-doc evaluation
  • Detects doc_count_error_upper_bound > 0 cases where users are treating the approximate top-N as exact, plus aggregations that should have been a composite aggregation for full enumeration
  • Traces each slow multi_terms back to the calling service via slow-log and APM correlation
  • Recommends concrete fixes: scope the query before the aggregation to shrink the input set, pre-join values at index time when only a few combinations matter, switch to a composite aggregation for exhaustive enumeration, or use a nested terms aggregation when a hierarchy is actually wanted
  • Tracks heap, circuit breaker, and latency impact after the change ships

This converts the manual aggregation-design loop into a continuous optimization workflow.

Try Pulse on your cluster.

Frequently Asked Questions

Q: How does multi_terms differ from a nested terms aggregation?
A: The multi_terms aggregation produces one bucket per unique combination across all listed fields, returning an array key. Nested terms aggregations produce a tree of buckets where the inner aggregation runs only inside each outer bucket - useful for hierarchical roll-ups, but not for unique-pair counts.

Q: How does multi_terms differ from a composite aggregation?
A: multi_terms returns top-N approximate results in one round trip. The composite aggregation paginates through every unique combination exactly. Use multi_terms for ranking, composite for enumeration.

Q: Can multi_terms use scripted sources?
A: Yes, each terms entry can take a script. Scripted sources disable global ordinals and are noticeably slower.

Q: How many fields can I combine?
A: No hard limit, but cost is multiplicative. Three fields are common, four or more usually call for a different design (pre-joined values at index time, or a composite aggregation).

Q: Does multi_terms handle missing values?
A: Per source. Set missing inside each terms entry. Without it, documents missing any of the source fields are excluded from the aggregation.

Q: Is the count returned by multi_terms exact?
A: Per bucket, yes. The top-N set itself is approximate at high cardinality - the same per-shard top-N truncation that affects the terms aggregation applies here.

Q: How do I monitor multi_terms aggregations for memory pressure?
A: Pulse tracks heap and request circuit breakers on Elasticsearch and OpenSearch, identifies multi_terms aggregations whose combined cardinality is producing Cartesian-product hash maps, attributes each to the calling service, and recommends scoping the input set, pre-joining values at index time, or switching to a composite aggregation for enumeration.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.