The quantileTDigest
function in ClickHouse is an aggregate function used for calculating approximate quantiles using the t-digest algorithm. It's particularly useful for large datasets where exact quantile calculations might be computationally expensive.
Syntax
quantileTDigest(level)(expr)
quantileTDigest(level)(expr, max_error)
For the official documentation, visit the ClickHouse Quantile Functions page.
Example Usage
SELECT quantileTDigest(0.5)(value) AS median
FROM (
SELECT number AS value
FROM numbers(1000000)
)
This query calculates the approximate median (50th percentile) of numbers from 0 to 999999.
Common Issues
- Precision: The t-digest algorithm provides approximate results, which may not be suitable for scenarios requiring exact quantiles.
- Memory usage: While more memory-efficient than exact methods, large datasets can still consume significant memory.
Best Practices
- Use
quantileTDigest
when dealing with large datasets where approximate results are acceptable. - For smaller datasets or when exact results are required, consider using
quantileExact
instead. - Adjust the
max_error
parameter to balance between accuracy and performance if needed.
Frequently Asked Questions
Q: How does quantileTDigest compare to other quantile functions in ClickHouse?
A: quantileTDigest
offers a good balance between accuracy and performance, especially for large datasets. It's more memory-efficient than quantileExact
but less precise. It's generally faster than quantile
but slower than quantileFast
.
Q: Can quantileTDigest calculate multiple quantiles in a single query?
A: Yes, you can use quantilesTDigest
to calculate multiple quantiles in one query, e.g., quantilesTDigest(0.5, 0.9, 0.99)(column)
.
Q: What is the default max_error for quantileTDigest?
A: The default max_error
is 0.01, which provides a good balance between accuracy and performance for most use cases.
Q: Is quantileTDigest suitable for calculating percentiles on time-series data?
A: Yes, quantileTDigest
can be effectively used for time-series data, especially when you need to calculate percentiles over large volumes of time-stamped values.
Q: Can the results of quantileTDigest be used in subqueries or JOINs?
A: Yes, you can use the results of quantileTDigest
in subqueries or JOINs like any other aggregate function result. However, be mindful of the approximate nature of the results when using them in complex queries.