The Elasticsearch composite aggregation is a multi-bucket aggregation that produces composite buckets from one or more sources (terms, histogram, date_histogram, geotile_grid) and lets you paginate through every bucket using an after_key cursor. Unlike the terms aggregation, which returns approximate top-N results, composite is designed to enumerate the full result set exactly, page by page. It is the right tool when you need to stream all groups without hitting search.max_buckets.
Syntax
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_by_day_and_product": {
"composite": {
"size": 1000,
"sources": [
{ "day": { "date_histogram": { "field": "date", "calendar_interval": "1d" } } },
{ "product": { "terms": { "field": "product.keyword" } } }
]
}
}
}
}
The response includes an after_key object. Pass it as composite.after in the next request to get the next page until buckets is empty.
Parameters
| Parameter | Default | Description |
|---|---|---|
sources |
required | Ordered array of named sources. Bucket key is the tuple of source values. |
size |
10 | Buckets returned per page. Max bound by search.max_buckets. |
after |
- | Cursor (after_key from previous page) for pagination. |
| Source types | - | terms, histogram, date_histogram, geotile_grid. |
Per-source order |
asc |
asc or desc - same direction must be used across all pages. |
Per-source missing_bucket |
false |
Include a bucket for docs missing the source field. |
Composite aggregations cannot be sorted by _count or by a sub-aggregation - results are always ordered by the source keys. Scoring is also unavailable inside composite.
Examples
Stream all (user_id, country) pairs in pages of 5000:
"composite": {
"size": 5000,
"sources": [
{ "user": { "terms": { "field": "user_id" } } },
{ "country": { "terms": { "field": "country.keyword" } } }
]
}
Continue from a previous page:
"composite": {
"size": 5000,
"sources": [
{ "user": { "terms": { "field": "user_id" } } },
{ "country": { "terms": { "field": "country.keyword" } } }
],
"after": { "user": 1042, "country": "DE" }
}
Include documents with missing values in their own bucket:
{ "region": { "terms": { "field": "region.keyword", "missing_bucket": true } } }
Performance Notes
Composite is the only aggregation that can scale to millions of unique combinations without exhausting the request circuit breaker. It does this by walking sorted ordinals on each shard and yielding only size results per page, so memory is bounded by size, not by total cardinality.
Order sources from lowest to highest cardinality where possible - it does not change correctness but typically reduces per-shard work. Each page is independent: caching is per-shard request cache, and the cursor is stateless, so a paginating client can be killed and resumed.
The most common operational pitfall is using composite for exports of hundreds of millions of buckets - while correct, it can saturate cluster resources for hours. Manually tracking which composite pagination jobs are saturating the search thread pool, and which ones could be parallelized by partitioning upstream, is exactly the loop Pulse runs continuously.
Common Mistakes
- Expecting
_countordering. Composite is always ordered by source keys, not by document count. - Trying to nest a
top_hitsaggregation and expecting it to sort across all buckets - sub-aggregations apply per bucket only. - Changing the
sourcesorder ororderdirection between pages, which invalidatesafter_keyand skips or duplicates buckets. - Setting
sizetoo small (e.g. 10), making millions of round trips. Pages of 1000-10000 are typical. - Using composite when you actually want top-N - prefer the terms aggregation for that, it is far cheaper.
Monitor Long-Running Composite Aggregations with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For composite aggregations specifically, Pulse:
- Identifies composite pagination jobs whose total walltime is saturating the search thread pool, and those whose
sizeper page is so small (e.g. 10) that millions of round trips are dominating cost - Flags composite aggregations being used for top-N where a cheaper terms aggregation would have answered in one request
- Spots composite queries hitting
too_many_buckets_exceptionbecausesizeplus nested sub-aggregations exceededsearch.max_buckets(default 65,536) - Detects scripted
sourcesthat disable the global-ordinals optimization and slow paging dramatically on high-cardinality fields - Traces each slow composite paging job back to the calling service via slow-log and APM correlation
- Recommends concrete fixes: raise per-page
sizeinto the 1000-10000 sweet spot, order sources from lowest to highest cardinality, partition the date range upstream to parallelize pagination, replace scripted sources with indexed fields, and switch to a terms aggregation when only top-N is needed - Tracks page-throughput and thread-pool impact after the change ships
This converts the manual long-running aggregation triage loop into a continuous optimization workflow.
Frequently Asked Questions
Q: How does the composite aggregation differ from a regular terms aggregation?
A: The terms aggregation returns approximate top-N results in one round trip, while the composite aggregation enumerates every bucket exactly using after_key pagination. Composite is for exhaustive iteration, terms is for top-N.
Q: Can I paginate a composite aggregation in parallel?
A: No. Pagination is strictly sequential because each page depends on the previous after_key. To parallelize, partition the data range upstream (e.g. by date or shard) and run independent composite queries against each partition.
Q: Which source types does composite support?
A: terms, histogram, date_histogram, and geotile_grid. Significant-terms, range, and IP-range sources are not supported.
Q: Can composite aggregations sort by a metric?
A: No. The composite aggregation orders only by the source keys. If you need top-N by metric, use a terms aggregation with order set to a sub-aggregation, accepting its approximate semantics.
Q: How big should size be?
A: 1000-10000 per page is a typical operational sweet spot. Smaller pages multiply round trips; larger pages risk pushing past search.max_buckets (default 65,536) when combined with sub-aggregations.
Q: Does the composite aggregation work with scripted sources?
A: Each source can take a script instead of a field, but scripted sources prevent the global-ordinals optimization and are much slower at high cardinality.
Q: Why am I getting too_many_buckets_exception?
A: The sum of buckets across all aggregations in one response exceeded search.max_buckets. Reduce size, reduce sub-aggregations, or raise the cluster setting if the workload truly needs it.
Q: How do I find composite aggregations that are saturating my cluster's search thread pool?
A: Pulse profiles long-running aggregations on Elasticsearch and OpenSearch, identifies composite paging jobs with undersized size, scripted sources, or wrong source ordering, attributes each to the calling service, and recommends concrete page-size, partitioning, and source-ordering changes.
Related Reading
- Terms Aggregation: top-N alternative with approximate semantics.
- Multi Terms Aggregation: single bucket on combinations of fields without pagination.
- Date Histogram Aggregation: common composite source for time series.
- Top Hits Aggregation: retrieve sample documents per composite bucket.
- Elasticsearch Query Language: the DSL composite aggregations run inside.