NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Composite Aggregation - Pagination Through All Buckets - Syntax, Example, and Tips

The Elasticsearch composite aggregation is a multi-bucket aggregation that produces composite buckets from one or more sources (terms, histogram, date_histogram, geotile_grid) and lets you paginate through every bucket using an after_key cursor. Unlike the terms aggregation, which returns approximate top-N results, composite is designed to enumerate the full result set exactly, page by page. It is the right tool when you need to stream all groups without hitting search.max_buckets.

Syntax

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "sales_by_day_and_product": {
      "composite": {
        "size": 1000,
        "sources": [
          { "day":     { "date_histogram": { "field": "date", "calendar_interval": "1d" } } },
          { "product": { "terms":          { "field": "product.keyword" } } }
        ]
      }
    }
  }
}

The response includes an after_key object. Pass it as composite.after in the next request to get the next page until buckets is empty.

Parameters

Parameter Default Description
sources required Ordered array of named sources. Bucket key is the tuple of source values.
size 10 Buckets returned per page. Max bound by search.max_buckets.
after - Cursor (after_key from previous page) for pagination.
Source types - terms, histogram, date_histogram, geotile_grid.
Per-source order asc asc or desc - same direction must be used across all pages.
Per-source missing_bucket false Include a bucket for docs missing the source field.

Composite aggregations cannot be sorted by _count or by a sub-aggregation - results are always ordered by the source keys. Scoring is also unavailable inside composite.

Examples

Stream all (user_id, country) pairs in pages of 5000:

"composite": {
  "size": 5000,
  "sources": [
    { "user":    { "terms": { "field": "user_id" } } },
    { "country": { "terms": { "field": "country.keyword" } } }
  ]
}

Continue from a previous page:

"composite": {
  "size": 5000,
  "sources": [
    { "user":    { "terms": { "field": "user_id" } } },
    { "country": { "terms": { "field": "country.keyword" } } }
  ],
  "after": { "user": 1042, "country": "DE" }
}

Include documents with missing values in their own bucket:

{ "region": { "terms": { "field": "region.keyword", "missing_bucket": true } } }

Performance Notes

Composite is the only aggregation that can scale to millions of unique combinations without exhausting the request circuit breaker. It does this by walking sorted ordinals on each shard and yielding only size results per page, so memory is bounded by size, not by total cardinality.

Order sources from lowest to highest cardinality where possible - it does not change correctness but typically reduces per-shard work. Each page is independent: caching is per-shard request cache, and the cursor is stateless, so a paginating client can be killed and resumed.

The most common operational pitfall is using composite for exports of hundreds of millions of buckets - while correct, it can saturate cluster resources for hours. Manually tracking which composite pagination jobs are saturating the search thread pool, and which ones could be parallelized by partitioning upstream, is exactly the loop Pulse runs continuously.

Common Mistakes

  1. Expecting _count ordering. Composite is always ordered by source keys, not by document count.
  2. Trying to nest a top_hits aggregation and expecting it to sort across all buckets - sub-aggregations apply per bucket only.
  3. Changing the sources order or order direction between pages, which invalidates after_key and skips or duplicates buckets.
  4. Setting size too small (e.g. 10), making millions of round trips. Pages of 1000-10000 are typical.
  5. Using composite when you actually want top-N - prefer the terms aggregation for that, it is far cheaper.

Monitor Long-Running Composite Aggregations with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For composite aggregations specifically, Pulse:

  • Identifies composite pagination jobs whose total walltime is saturating the search thread pool, and those whose size per page is so small (e.g. 10) that millions of round trips are dominating cost
  • Flags composite aggregations being used for top-N where a cheaper terms aggregation would have answered in one request
  • Spots composite queries hitting too_many_buckets_exception because size plus nested sub-aggregations exceeded search.max_buckets (default 65,536)
  • Detects scripted sources that disable the global-ordinals optimization and slow paging dramatically on high-cardinality fields
  • Traces each slow composite paging job back to the calling service via slow-log and APM correlation
  • Recommends concrete fixes: raise per-page size into the 1000-10000 sweet spot, order sources from lowest to highest cardinality, partition the date range upstream to parallelize pagination, replace scripted sources with indexed fields, and switch to a terms aggregation when only top-N is needed
  • Tracks page-throughput and thread-pool impact after the change ships

This converts the manual long-running aggregation triage loop into a continuous optimization workflow.

Try Pulse on your cluster.

Frequently Asked Questions

Q: How does the composite aggregation differ from a regular terms aggregation?
A: The terms aggregation returns approximate top-N results in one round trip, while the composite aggregation enumerates every bucket exactly using after_key pagination. Composite is for exhaustive iteration, terms is for top-N.

Q: Can I paginate a composite aggregation in parallel?
A: No. Pagination is strictly sequential because each page depends on the previous after_key. To parallelize, partition the data range upstream (e.g. by date or shard) and run independent composite queries against each partition.

Q: Which source types does composite support?
A: terms, histogram, date_histogram, and geotile_grid. Significant-terms, range, and IP-range sources are not supported.

Q: Can composite aggregations sort by a metric?
A: No. The composite aggregation orders only by the source keys. If you need top-N by metric, use a terms aggregation with order set to a sub-aggregation, accepting its approximate semantics.

Q: How big should size be?
A: 1000-10000 per page is a typical operational sweet spot. Smaller pages multiply round trips; larger pages risk pushing past search.max_buckets (default 65,536) when combined with sub-aggregations.

Q: Does the composite aggregation work with scripted sources?
A: Each source can take a script instead of a field, but scripted sources prevent the global-ordinals optimization and are much slower at high cardinality.

Q: Why am I getting too_many_buckets_exception?
A: The sum of buckets across all aggregations in one response exceeded search.max_buckets. Reduce size, reduce sub-aggregations, or raise the cluster setting if the workload truly needs it.

Q: How do I find composite aggregations that are saturating my cluster's search thread pool?
A: Pulse profiles long-running aggregations on Elasticsearch and OpenSearch, identifies composite paging jobs with undersized size, scripted sources, or wrong source ordering, attributes each to the calling service, and recommends concrete page-size, partitioning, and source-ordering changes.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.