Elasticsearch Percentiles Aggregation - Syntax, Example, and Tips

Pulse - Elasticsearch Operations Done Right

On this page

Syntax Example Usage Common Issues Best Practices Frequently Asked Questions

The Percentiles Aggregation is a multi-value metrics aggregation that calculates one or more percentiles over numeric values extracted from the aggregated documents. It provides insights into the distribution of values in a dataset by computing specified percentile ranks.

Syntax

{
  "aggs": {
    "NAME": {
      "percentiles": {
        "field": "FIELD_NAME",
        "percents": [1, 5, 25, 50, 75, 95, 99]
      }
    }
  }
}

For more details, refer to the official Elasticsearch documentation.

Example Usage

GET /sales/_search
{
  "size": 0,
  "aggs": {
    "load_time_outliers": {
      "percentiles": {
        "field": "load_time",
        "percents": [95, 99, 99.9]
      }
    }
  }
}

This example calculates the 95th, 99th, and 99.9th percentiles of the load_time field in the sales index.

Common Issues

  1. Accuracy vs. Performance: The default algorithm (T-Digest) trades accuracy for memory efficiency. For high-precision requirements, consider using the hdr (High Dynamic Range Histogram) method.
  2. Missing Values: By default, documents without a value for the specified field are ignored. Use "missing": VALUE to assign a default value to such documents.
  3. Non-numeric Fields: Ensure the field specified is numeric; attempting to run percentiles on non-numeric fields will result in an error.

Best Practices

  1. Choose percentiles that are relevant to your use case. Common choices include [25, 50, 75] for quartiles or [1, 5, 25, 50, 75, 95, 99] for a more comprehensive view.
  2. Use the compression parameter to balance memory usage and accuracy when using the T-Digest algorithm.
  3. For fields with a known range and need for high accuracy, consider using the hdr method with appropriate number_of_significant_value_digits.

Frequently Asked Questions

Q: How does the Percentiles Aggregation differ from the Percentile Ranks Aggregation?
A: The Percentiles Aggregation calculates percentile values for given percentile ranks, while the Percentile Ranks Aggregation calculates percentile ranks for given values. They are inverse operations of each other.

Q: Can I use Percentiles Aggregation on nested fields?
A: Yes, you can use Percentiles Aggregation on nested fields by wrapping it in a Nested Aggregation.

Q: How can I improve the accuracy of Percentiles Aggregation results?
A: You can improve accuracy by increasing the compression parameter for the T-Digest algorithm or by using the hdr method with an appropriate number_of_significant_value_digits for known value ranges.

Q: Is it possible to get exact percentile values instead of approximations?
A: Elasticsearch's Percentiles Aggregation always provides approximate results for performance reasons. For exact values, you would need to retrieve all documents and perform calculations client-side.

Q: Can Percentiles Aggregation be used in combination with other aggregations?
A: Yes, Percentiles Aggregation can be combined with other aggregations, such as bucket aggregations, to provide percentile calculations for specific subsets of your data.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.