The Elasticsearch sum_bucket aggregation is a sibling pipeline aggregation that takes the output of another aggregation - typically a metric inside a bucket aggregation - and returns the total across all those buckets. It runs after the primary aggregations have produced their buckets, so it does not touch documents directly. Use it to get a grand total across a date histogram, terms aggregation, or any other multi-bucket structure without issuing a separate query.
Syntax
GET /sales/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": { "field": "date", "calendar_interval": "month" },
"aggs": {
"monthly_sales": { "sum": { "field": "amount" } }
}
},
"total_sales": {
"sum_bucket": {
"buckets_path": "sales_per_month>monthly_sales",
"gap_policy": "skip"
}
}
}
}
sum_bucket must be defined as a sibling of the bucket aggregation it references - at the same level in the aggs tree, not nested inside it.
Parameters
| Parameter | Default | Description |
|---|---|---|
buckets_path |
required | Path to the sibling metric, using > between aggregation levels and . for stats fields. |
gap_policy |
skip |
skip ignores empty buckets; insert_zeros treats missing values as 0. |
format |
- | Numeric format applied to value_as_string. |
The buckets_path syntax follows the pipeline aggregation grammar. For a sum of the count from a stats agg, use my_agg>my_stats.count.
Examples
Grand total events across a terms aggregation:
"aggs": {
"by_host": {
"terms": { "field": "host.keyword", "size": 100 },
"aggs": {
"events": { "value_count": { "field": "event.id" } }
}
},
"events_total": {
"sum_bucket": { "buckets_path": "by_host>events" }
}
}
Sum bucket on a stats sub-aggregation field:
"aggs": {
"by_day": {
"date_histogram": { "field": "@timestamp", "calendar_interval": "day" },
"aggs": {
"size_stats": { "stats": { "field": "bytes" } }
}
},
"total_bytes": {
"sum_bucket": { "buckets_path": "by_day>size_stats.sum" }
}
}
Treat empty days as zero:
"sum_bucket": {
"buckets_path": "by_day>monthly_sales",
"gap_policy": "insert_zeros"
}
Performance Notes
Pipeline aggregations like sum_bucket run after the source aggregations on the coordinating node, on the already-reduced bucket list. They do not re-scan documents and are essentially free in cost compared to the underlying bucket and metric aggregations. The expensive work is the date histogram or terms aggregation that produces the input buckets.
The most common operational issue is misnaming in buckets_path - a typo silently returns null or fails the request depending on context. Validate the path against the structure of the source aggregation. Pulse helps surface aggregation queries that consistently fail or return null on Elasticsearch and OpenSearch clusters, including bad pipeline-path references.
Common Mistakes
- Nesting
sum_bucketinside the bucket aggregation it references. It must be a sibling. - Pointing
buckets_pathat a bucket aggregation instead of a metric inside it - sum_bucket needs a numeric value per bucket. - Forgetting that
gap_policy: skip(the default) silently drops empty buckets, which can underrepresent the real total when buckets are sparse. - Using sum_bucket on a non-numeric source path.
- Expecting
sum_bucketto filter or sort - it returns one scalar; for ranking buckets by total, useorderon the parent bucket aggregation.
Frequently Asked Questions
Q: How does sum_bucket differ from a regular sum aggregation?
A: The sum aggregation sums raw field values across documents. The sum_bucket aggregation is a pipeline aggregation that sums a metric across the buckets of a sibling bucket aggregation. Different inputs, different stages of the pipeline.
Q: Can I use sum_bucket on non-numeric paths?
A: No. The referenced path must yield a numeric value per bucket - either a metric aggregation or a numeric field of a stats/extended_stats output.
Q: What does gap_policy do?
A: skip ignores buckets with missing values; insert_zeros treats them as zero. With skip, sparse date histograms can produce surprisingly low totals - choose deliberately based on what "no data" means in your dataset.
Q: Can sum_bucket be chained with other pipeline aggregations?
A: Yes. The output of sum_bucket is a single numeric value and can feed pipeline aggregations like bucket_script. Pipeline chaining respects the same sibling/parent rules.
Q: Does sum_bucket support format for currency display?
A: Yes, the format parameter applies a Java DecimalFormat pattern to value_as_string, while value remains a raw double.
Q: Is sum_bucket aware of search.max_buckets?
A: Yes - it operates on the buckets produced by the source aggregation, so the source aggregation must fit within search.max_buckets (default 65,536). If the source hits the limit, the request fails before sum_bucket runs.
Related Reading
- Sum Aggregation: raw document sum, the metric sum_bucket typically references.
- Date Histogram Aggregation: common parent for time-series sum_bucket.
- Terms Aggregation: common parent for category-based totals.
- Composite Aggregation: paginate buckets when totals are huge.
- Elasticsearch Query Language: the DSL pipeline aggregations run inside.