The Bucket Selector Aggregation is a pipeline aggregation in Elasticsearch that allows you to filter buckets based on specified criteria. It evaluates a script for each bucket and keeps only those buckets for which the script returns true.
Syntax
{
"bucket_selector": {
"buckets_path": {
"my_var1": "metric1",
"my_var2": "metric2"
},
"script": "params.my_var1 > params.my_var2"
}
}
For more details, refer to the official Elasticsearch documentation.
Example Usage
Here's an example that selects buckets where the average price is greater than 100:
{
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"high_value_months": {
"bucket_selector": {
"buckets_path": {
"avg_price": "avg_price"
},
"script": "params.avg_price > 100"
}
}
}
}
}
}
Common Issues
- Script errors: Ensure that the script syntax is correct and all referenced variables are properly defined in the
buckets_path
. - Missing buckets: If no buckets meet the criteria, the result may be empty. Consider using a
bucket_sort
aggregation to handle empty results gracefully. - Performance impact: Complex scripts can slow down aggregation performance, especially on large datasets.
Best Practices
- Keep scripts simple and efficient to minimize performance impact.
- Use
bucket_sort
in conjunction withbucket_selector
for more advanced filtering and sorting. - Consider using
min_doc_count
in parent aggregations to exclude buckets with insufficient data before applying the selector. - Test your bucket selector aggregations on a small dataset before running them on large production data.
Frequently Asked Questions
Q: Can I use multiple conditions in a bucket selector script?
A: Yes, you can use multiple conditions in the script. For example: params.avg_price > 100 && params.total_sales > 1000
.
Q: How does bucket selector aggregation affect parent aggregations?
A: Bucket selector filters out buckets from the parent aggregation that don't meet the specified criteria. This means that subsequent aggregations will only see the filtered set of buckets.
Q: Can I use bucket selector with nested aggregations?
A: Yes, you can use bucket selector with nested aggregations. Just ensure that the buckets_path
correctly references the metrics from the nested aggregations.
Q: Is it possible to use bucket selector to filter based on the bucket key?
A: Yes, you can access the bucket key using _key
in your script. For example: params._key > 5
for numeric keys or params._key.startsWith('A')
for string keys.
Q: How can I debug a bucket selector aggregation if it's not producing expected results?
A: You can add a gap_policy
parameter set to "skip" to see which buckets are being filtered out. Also, consider using Elasticsearch's Explain API to understand how the aggregation is being executed.