Elasticsearch Percentiles Aggregation - Syntax, Example, and Tips

The Percentiles Aggregation is a multi-value metrics aggregation that calculates one or more percentiles over numeric values extracted from the aggregated documents. It provides insights into the distribution of values in a dataset by computing specified percentile ranks.

Syntax

{
  "aggs": {
    "NAME": {
      "percentiles": {
        "field": "FIELD_NAME",
        "percents": [1, 5, 25, 50, 75, 95, 99]
      }
    }
  }
}

For more details, refer to the official Elasticsearch documentation.

Example Usage

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "load_time_outliers": {
      "percentiles": {
        "field": "load_time",
        "percents": [95, 99, 99.9]
      }
    }
  }
}

This example calculates the 95th, 99th, and 99.9th percentiles of the load_time field in the sales index.

Common Issues

  1. Accuracy vs. Performance: The default algorithm (T-Digest) trades accuracy for memory efficiency. For high-precision requirements, consider using the hdr (High Dynamic Range Histogram) method.
  2. Missing Values: By default, documents without a value for the specified field are ignored. Use "missing": VALUE to assign a default value to such documents.
  3. Non-numeric Fields: Ensure the field specified is numeric; attempting to run percentiles on non-numeric fields will result in an error.

Best Practices

  1. Choose percentiles that are relevant to your use case. Common choices include [25, 50, 75] for quartiles or [1, 5, 25, 50, 75, 95, 99] for a more comprehensive view.
  2. Use the compression parameter to balance memory usage and accuracy when using the T-Digest algorithm.
  3. For fields with a known range and need for high accuracy, consider using the hdr method with appropriate number_of_significant_value_digits.

Frequently Asked Questions

Q: How does the Percentiles Aggregation differ from the Percentile Ranks Aggregation?
A: The Percentiles Aggregation calculates percentile values for given percentile ranks, while the Percentile Ranks Aggregation calculates percentile ranks for given values. They are inverse operations of each other.

Q: Can I use Percentiles Aggregation on nested fields?
A: Yes, you can use Percentiles Aggregation on nested fields by wrapping it in a Nested Aggregation.

Q: How can I improve the accuracy of Percentiles Aggregation results?
A: You can improve accuracy by increasing the compression parameter for the T-Digest algorithm or by using the hdr method with an appropriate number_of_significant_value_digits for known value ranges.

Q: Is it possible to get exact percentile values instead of approximations?
A: Elasticsearch's Percentiles Aggregation always provides approximate results for performance reasons. For exact values, you would need to retrieve all documents and perform calculations client-side.

Q: Can Percentiles Aggregation be used in combination with other aggregations?
A: Yes, Percentiles Aggregation can be combined with other aggregations, such as bucket aggregations, to provide percentile calculations for specific subsets of your data.

Pulse - Elasticsearch Operations Done Right

All the Elasticsearch support you'll ever need

Free Trial

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.