ClickHouse quantileTDigest Function

The quantileTDigest function in ClickHouse is an aggregate function used for calculating approximate quantiles using the t-digest algorithm. It's particularly useful for large datasets where exact quantile calculations might be computationally expensive.

Syntax

quantileTDigest(level)(expr)
quantileTDigest(level)(expr, max_error)

For the official documentation, visit the ClickHouse Quantile Functions page.

Example Usage

SELECT quantileTDigest(0.5)(value) AS median
FROM (
    SELECT number AS value
    FROM numbers(1000000)
)

This query calculates the approximate median (50th percentile) of numbers from 0 to 999999.

Common Issues

  1. Precision: The t-digest algorithm provides approximate results, which may not be suitable for scenarios requiring exact quantiles.
  2. Memory usage: While more memory-efficient than exact methods, large datasets can still consume significant memory.

Best Practices

  1. Use quantileTDigest when dealing with large datasets where approximate results are acceptable.
  2. For smaller datasets or when exact results are required, consider using quantileExact instead.
  3. Adjust the max_error parameter to balance between accuracy and performance if needed.

Frequently Asked Questions

Q: How does quantileTDigest compare to other quantile functions in ClickHouse?
A: quantileTDigest offers a good balance between accuracy and performance, especially for large datasets. It's more memory-efficient than quantileExact but less precise. It's generally faster than quantile but slower than quantileFast.

Q: Can quantileTDigest calculate multiple quantiles in a single query?
A: Yes, you can use quantilesTDigest to calculate multiple quantiles in one query, e.g., quantilesTDigest(0.5, 0.9, 0.99)(column).

Q: What is the default max_error for quantileTDigest?
A: The default max_error is 0.01, which provides a good balance between accuracy and performance for most use cases.

Q: Is quantileTDigest suitable for calculating percentiles on time-series data?
A: Yes, quantileTDigest can be effectively used for time-series data, especially when you need to calculate percentiles over large volumes of time-stamped values.

Q: Can the results of quantileTDigest be used in subqueries or JOINs?
A: Yes, you can use the results of quantileTDigest in subqueries or JOINs like any other aggregate function result. However, be mindful of the approximate nature of the results when using them in complex queries.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.