The Percentiles Aggregation is a multi-value metrics aggregation that calculates one or more percentiles over numeric values extracted from the aggregated documents. It provides insights into the distribution of values in a dataset by computing specified percentile ranks.
Syntax
{
"aggs": {
"NAME": {
"percentiles": {
"field": "FIELD_NAME",
"percents": [1, 5, 25, 50, 75, 95, 99]
}
}
}
}
For more details, refer to the official Elasticsearch documentation.
Example Usage
GET /sales/_search
{
"size": 0,
"aggs": {
"load_time_outliers": {
"percentiles": {
"field": "load_time",
"percents": [95, 99, 99.9]
}
}
}
}
This example calculates the 95th, 99th, and 99.9th percentiles of the load_time
field in the sales
index.
Common Issues
- Accuracy vs. Performance: The default algorithm (T-Digest) trades accuracy for memory efficiency. For high-precision requirements, consider using the
hdr
(High Dynamic Range Histogram) method. - Missing Values: By default, documents without a value for the specified field are ignored. Use
"missing": VALUE
to assign a default value to such documents. - Non-numeric Fields: Ensure the field specified is numeric; attempting to run percentiles on non-numeric fields will result in an error.
Best Practices
- Choose percentiles that are relevant to your use case. Common choices include [25, 50, 75] for quartiles or [1, 5, 25, 50, 75, 95, 99] for a more comprehensive view.
- Use the
compression
parameter to balance memory usage and accuracy when using the T-Digest algorithm. - For fields with a known range and need for high accuracy, consider using the
hdr
method with appropriatenumber_of_significant_value_digits
.
Frequently Asked Questions
Q: How does the Percentiles Aggregation differ from the Percentile Ranks Aggregation?
A: The Percentiles Aggregation calculates percentile values for given percentile ranks, while the Percentile Ranks Aggregation calculates percentile ranks for given values. They are inverse operations of each other.
Q: Can I use Percentiles Aggregation on nested fields?
A: Yes, you can use Percentiles Aggregation on nested fields by wrapping it in a Nested Aggregation.
Q: How can I improve the accuracy of Percentiles Aggregation results?
A: You can improve accuracy by increasing the compression
parameter for the T-Digest algorithm or by using the hdr
method with an appropriate number_of_significant_value_digits
for known value ranges.
Q: Is it possible to get exact percentile values instead of approximations?
A: Elasticsearch's Percentiles Aggregation always provides approximate results for performance reasons. For exact values, you would need to retrieve all documents and perform calculations client-side.
Q: Can Percentiles Aggregation be used in combination with other aggregations?
A: Yes, Percentiles Aggregation can be combined with other aggregations, such as bucket aggregations, to provide percentile calculations for specific subsets of your data.