Elasticsearch Bucket Selector Aggregation - Syntax, Example, and Tips

Pulse - Elasticsearch Operations Done Right

On this page

Syntax Example Usage Common Issues Best Practices Frequently Asked Questions

The Bucket Selector Aggregation is a pipeline aggregation in Elasticsearch that allows you to filter buckets based on specified criteria. It evaluates a script for each bucket and keeps only those buckets for which the script returns true.

Syntax

{
  "bucket_selector": {
    "buckets_path": {
      "my_var1": "metric1",
      "my_var2": "metric2"
    },
    "script": "params.my_var1 > params.my_var2"
  }
}

For more details, refer to the official Elasticsearch documentation.

Example Usage

Here's an example that selects buckets where the average price is greater than 100:

{
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "avg_price": {
          "avg": {
            "field": "price"
          }
        },
        "high_value_months": {
          "bucket_selector": {
            "buckets_path": {
              "avg_price": "avg_price"
            },
            "script": "params.avg_price > 100"
          }
        }
      }
    }
  }
}

Common Issues

  1. Script errors: Ensure that the script syntax is correct and all referenced variables are properly defined in the buckets_path.
  2. Missing buckets: If no buckets meet the criteria, the result may be empty. Consider using a bucket_sort aggregation to handle empty results gracefully.
  3. Performance impact: Complex scripts can slow down aggregation performance, especially on large datasets.

Best Practices

  1. Keep scripts simple and efficient to minimize performance impact.
  2. Use bucket_sort in conjunction with bucket_selector for more advanced filtering and sorting.
  3. Consider using min_doc_count in parent aggregations to exclude buckets with insufficient data before applying the selector.
  4. Test your bucket selector aggregations on a small dataset before running them on large production data.

Frequently Asked Questions

Q: Can I use multiple conditions in a bucket selector script?
A: Yes, you can use multiple conditions in the script. For example: params.avg_price > 100 && params.total_sales > 1000.

Q: How does bucket selector aggregation affect parent aggregations?
A: Bucket selector filters out buckets from the parent aggregation that don't meet the specified criteria. This means that subsequent aggregations will only see the filtered set of buckets.

Q: Can I use bucket selector with nested aggregations?
A: Yes, you can use bucket selector with nested aggregations. Just ensure that the buckets_path correctly references the metrics from the nested aggregations.

Q: Is it possible to use bucket selector to filter based on the bucket key?
A: Yes, you can access the bucket key using _key in your script. For example: params._key > 5 for numeric keys or params._key.startsWith('A') for string keys.

Q: How can I debug a bucket selector aggregation if it's not producing expected results?
A: You can add a gap_policy parameter set to "skip" to see which buckets are being filtered out. Also, consider using Elasticsearch's Explain API to understand how the aggregation is being executed.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.