Elasticsearch Percentiles Bucket Aggregation - Syntax, Example, and Tips

The Percentiles Bucket Aggregation is a sibling pipeline aggregation that calculates percentiles across all bucket of a specified metric in a parent multi-bucket aggregation. It provides insights into the distribution of values across buckets, allowing for advanced data analysis and outlier detection.

Syntax

{
  "percentiles_bucket": {
    "buckets_path": "string",
    "percents": [number],
    "format": "string",
    "keyed": boolean
  }
}

For detailed information, refer to the official Elasticsearch documentation.

Example Usage

{
  "aggs": {
    "sales_per_month": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sales": {
          "sum": {
            "field": "price"
          }
        }
      }
    },
    "percentiles_monthly_sales": {
      "percentiles_bucket": {
        "buckets_path": "sales_per_month>sales",
        "percents": [25, 50, 75]
      }
    }
  }
}

This example calculates the 25th, 50th, and 75th percentiles of monthly sales across all buckets.

Common Issues

  1. Incorrect buckets_path: Ensure the path correctly points to the metric in the parent aggregation.
  2. Missing data: Percentiles calculation may be affected by buckets with missing or null values.
  3. Performance impact: Computing percentiles across a large number of buckets can be resource-intensive.

Best Practices

  1. Use meaningful percentile values based on your data distribution and analysis needs.
  2. Consider using the keyed parameter for more readable output.
  3. Combine with other aggregations for comprehensive analysis.
  4. Monitor performance when used on large datasets or with many buckets.

Frequently Asked Questions

Q: How does the Percentiles Bucket Aggregation differ from the regular Percentiles Aggregation?
A: The Percentiles Bucket Aggregation calculates percentiles across buckets of a parent aggregation, while the regular Percentiles Aggregation computes percentiles within a single bucket of documents.

Q: Can I use custom percentile values?
A: Yes, you can specify custom percentile values using the percents parameter. For example, "percents": [10, 30, 70, 90].

Q: How does the aggregation handle missing values?
A: By default, buckets with missing values are ignored. You can use the gap_policy parameter to control how missing values are handled.

Q: Is it possible to format the output of Percentiles Bucket Aggregation?
A: Yes, you can use the format parameter to specify a format string for the output values, such as "format": "0.00%" for percentage representation.

Q: Can Percentiles Bucket Aggregation be used with nested aggregations?
A: Yes, it can be used with nested aggregations by specifying the correct buckets_path to reach the desired metric within the nested structure.

Pulse - Elasticsearch Operations Done Right

All the Elasticsearch support you'll ever need

Free Trial

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.