Elasticsearch Percentile Ranks Aggregation - Syntax, Example, and Tips

Pulse - Elasticsearch Operations Done Right

On this page

Syntax Example Usage Common Issues Best Practices and Additional Information Frequently Asked Questions

The Percentile Ranks Aggregation in Elasticsearch calculates the percentile ranks for specified values over a numeric field. It helps determine the relative standing of given values within a dataset, providing insights into data distribution and comparative analysis.

Syntax

Basic syntax:

{
  "aggs": {
    "percentile_ranks_agg": {
      "percentile_ranks": {
        "field": "field_name",
        "values": [value1, value2, ...]
      }
    }
  }
}

For detailed information, refer to the official Elasticsearch documentation on Percentile Ranks Aggregation.

Example Usage

Here's an example that calculates the percentile ranks for response times in a web server log:

GET /web_logs/_search
{
  "size": 0,
  "aggs": {
    "response_time_ranks": {
      "percentile_ranks": {
        "field": "response_time",
        "values": [200, 500, 1000]
      }
    }
  }
}

This query will return the percentile ranks for response times of 200ms, 500ms, and 1000ms.

Common Issues

  1. Data type mismatch: Ensure the field specified is a numeric type.
  2. Missing values: Be aware of how missing values are handled in your dataset.
  3. Performance with large datasets: Percentile ranks can be resource-intensive for very large datasets.

Best Practices and Additional Information

  • Use percentile ranks when you need to understand the relative position of specific values within your data distribution.
  • Combine with other aggregations for more comprehensive analysis.
  • Consider using approximate percentile ranks for very large datasets to improve performance.
  • Be cautious when interpreting results with skewed data distributions.

Frequently Asked Questions

Q: How does Percentile Ranks Aggregation differ from Percentiles Aggregation?
A: While Percentiles Aggregation calculates percentile values for given percentages, Percentile Ranks Aggregation determines the percentages for given values. It's the inverse operation.

Q: Can I use Percentile Ranks Aggregation on non-numeric fields?
A: No, Percentile Ranks Aggregation only works on numeric fields. For non-numeric fields, you would need to use different types of aggregations.

Q: How accurate is the Percentile Ranks Aggregation?
A: By default, it uses a TDigest algorithm which provides accurate estimates. For exact calculations, you can use the "hdr" (High Dynamic Range) method, but this may impact performance.

Q: Is it possible to get percentile ranks for dynamic values?
A: Yes, you can use script parameters to dynamically set the values for which you want to calculate percentile ranks.

Q: How does Percentile Ranks Aggregation handle outliers in the data?
A: Percentile Ranks Aggregation includes all data points in its calculations, including outliers. If outliers are a concern, you might want to consider data preprocessing or using robust statistical methods in conjunction with this aggregation.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.