The Elasticsearch cardinality aggregation is a single-value metric aggregation that returns the approximate count of distinct values in a field. It is built on the HyperLogLog++ (HLL++) algorithm, which trades a small, bounded error for fixed memory usage that does not grow with input size. Use it for "how many unique users / IPs / sessions" questions where exact counts are unaffordable and a typical error of under 1% is acceptable.
Syntax
GET /events/_search
{
"size": 0,
"aggs": {
"unique_users": {
"cardinality": {
"field": "user_id",
"precision_threshold": 3000
}
}
}
}
The result is a single number under aggregations.unique_users.value.
Parameters
| Parameter | Default | Description |
|---|---|---|
field |
required (or script) |
Field to count distinct values from. |
precision_threshold |
3000 | Counts below this are near-exact. Above, error grows. Max accepted value is 40000. |
missing |
- | Value to substitute when the field is missing. Treated as a distinct value. |
script |
- | Use a script instead of a field. Slower; disables global ordinals. |
execution_hint |
auto | direct, global_ordinals, segment_ordinals, or save_memory_heuristic. |
Memory per cardinality aggregation is roughly precision_threshold * 8 bytes per shard. At the maximum (40000) that is about 320 KB per shard per cardinality aggregation - a single number, but multiplied across nested buckets it adds up.
Examples
Unique users in the last 24 hours:
GET /events/_search
{
"size": 0,
"query": { "range": { "@timestamp": { "gte": "now-24h" } } },
"aggs": {
"uniques": { "cardinality": { "field": "user_id" } }
}
}
Unique users per country (nested under terms):
"aggs": {
"by_country": {
"terms": { "field": "country.keyword", "size": 20 },
"aggs": {
"uniques": { "cardinality": { "field": "user_id", "precision_threshold": 10000 } }
}
}
}
Tighter precision for analytics dashboards:
"cardinality": { "field": "session_id", "precision_threshold": 40000 }
Precision, Memory, and Performance Notes
HyperLogLog++ guarantees counts below precision_threshold are exact or near-exact. Above that, error scales with the square root of cardinality. With the default precision_threshold of 3000, expected relative error is around 1-2% at 100,000 distinct values and grows slowly from there. The maximum precision_threshold is 40000, which gives roughly 0.4% error at one million distinct values - and costs 16x the memory of the default.
The aggregation hashes each value (Murmur3) and tracks the leading-zero distribution. Pre-hashing your data at index time with the mapper-murmur3 plugin lets the aggregation skip the hash step at query time - useful for very hot dashboards on keyword fields. Numeric fields are hashed automatically with no plugin needed.
Cardinality aggregations under high-cardinality terms buckets are a frequent operational hotspot - each bucket holds its own HLL sketch. A terms aggregation returning 10000 buckets, each with a cardinality sub-agg at precision 40000, consumes meaningful heap per shard per request. Walking circuit-breaker stats by hand and tracing each trip back to the responsible query is exactly the loop Pulse runs continuously.
Common Mistakes
- Treating
precision_thresholdas a hard accuracy guarantee above the threshold. It is the boundary below which counts are near-exact. - Setting
precision_thresholdto 40000 reflexively when the dataset has only a few thousand distinct values - it wastes memory with no accuracy gain. - Using
missingand forgetting it counts as a distinct value, inflating uniques by one. - Running cardinality on a
textfield - it triggers fielddata loading. Use a.keywordsub-field. - Expecting cardinality results to sum across buckets - HLL sketches do merge correctly when run by Elasticsearch, but summing the integer values client-side double-counts overlap.
Optimize Cardinality Aggregations for Memory with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For cardinality aggregations specifically, Pulse:
- Identifies cardinality sub-aggregations nested inside high-cardinality terms aggregations, where each bucket holds its own HLL++ sketch at
precision_threshold * 8bytes per shard - the pattern that drives heap pressure during dashboard refreshes - Flags
precision_threshold: 40000set reflexively on fields with only a few thousand distinct values, where 16x memory is being spent for no accuracy gain - Spots cardinality aggregations running on
textfields without a.keywordsub-field, triggering fielddata loading - Traces each slow or memory-heavy cardinality query back to the calling service via slow-log and APM correlation
- Recommends concrete fixes: right-size
precision_thresholdagainst actual cardinality, add a Murmur3 hash sub-field via themapper-murmur3plugin to skip per-query hashing on hot dashboards, move expensive cardinality sub-aggs out of high-cardinality parents, and wrap cardinality on nested fields in anestedaggregation - Tracks heap, circuit breaker, and latency impact after the change ships
This converts the manual sketch-sizing and circuit-breaker loop into a continuous optimization workflow.
Frequently Asked Questions
Q: How accurate is the cardinality aggregation?
A: The cardinality aggregation is near-exact for distinct counts below precision_threshold (default 3000). Above that, error scales with the square root of cardinality, typically 1-2% at 100k uniques with default settings, dropping to about 0.4% at the maximum threshold of 40000.
Q: What is the maximum precision_threshold?
A: 40000. Higher values are accepted by the API but capped at 40000 internally. Memory cost grows linearly with this setting.
Q: How does cardinality differ from value_count?
A: The cardinality aggregation counts distinct values approximately; the value_count aggregation counts all non-null values including duplicates exactly. They answer different questions.
Q: Can I get an exact distinct count?
A: Not from this aggregation. For exact counts, use a composite aggregation to enumerate every unique value and count pages client-side, accepting the much higher cost.
Q: Does cardinality work on nested fields?
A: Yes, but wrap it in a nested aggregation so it operates on the inner documents rather than parent values.
Q: Should I pre-hash my field for cardinality queries?
A: For very high-frequency dashboards on string fields, install the mapper-murmur3 plugin and add a murmur3 sub-field. The aggregation reads the pre-computed hash and skips per-query hashing. Numeric fields are already efficient.
Q: How is null handled?
A: Documents missing the field are not counted unless missing is set, in which case the substitute value contributes one distinct value to the result.
Q: How do I monitor cardinality aggregations for memory pressure in production?
A: Pulse tracks request and fielddata circuit breakers, identifies cardinality sub-aggregations nested under high-cardinality terms aggregations and oversized precision_threshold values, attributes each to the calling service, and recommends right-sizing the threshold or pre-hashing the field with the mapper-murmur3 plugin.
Related Reading
- Value Count Aggregation: exact count of values, including duplicates.
- Terms Aggregation: see the actual top values, not just how many.
- Composite Aggregation: enumerate every distinct combination exactly.
- Sum Aggregation: complement cardinality with totals per bucket.
- Elasticsearch Query Language: the DSL the cardinality aggregation runs inside.