Elasticsearch Cardinality Aggregation - Syntax, Example, and Tips

The Cardinality Aggregation in Elasticsearch is used to calculate the approximate count of unique or distinct values in a field. It's particularly useful when you need to count the number of unique items in large datasets without the need for exact precision.

Syntax

{
  "aggs": {
    "unique_count": {
      "cardinality": {
        "field": "field_name"
      }
    }
  }
}

For more details, refer to the official Elasticsearch documentation on Cardinality Aggregation.

Example Usage

Here's an example of using the Cardinality Aggregation to count unique user IDs:

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id"
      }
    }
  }
}

Common Issues

  1. High memory usage: For very large datasets, cardinality aggregation can consume significant memory.
  2. Precision vs. Performance: The default precision_threshold of 3000 may not be suitable for all use cases.
  3. Null values: By default, null values are counted as a distinct value.

Best Practices

  1. Adjust the precision_threshold parameter based on your needs for accuracy vs. performance.
  2. Use script parameter for complex cardinality calculations involving multiple fields.
  3. Consider using the HyperLogLog++ algorithm for extremely large datasets.

Frequently Asked Questions

Q: How accurate is the Cardinality Aggregation?
A: The Cardinality Aggregation provides an approximate count. It's generally accurate within 1% error for datasets with cardinality up to the precision_threshold value (default 3000).

Q: Can I use Cardinality Aggregation on nested fields?
A: Yes, you can use Cardinality Aggregation on nested fields by wrapping it in a nested aggregation.

Q: How does Cardinality Aggregation handle null values?
A: By default, null values are counted as a distinct value. You can exclude null values using a filter aggregation.

Q: What's the difference between Cardinality Aggregation and Value Count Aggregation?
A: Cardinality Aggregation counts unique values, while Value Count Aggregation counts all values, including duplicates.

Q: Can Cardinality Aggregation be used with other aggregations?
A: Yes, Cardinality Aggregation can be combined with other aggregations like terms or date histogram for more complex analytics.

Pulse - Elasticsearch Operations Done Right

All the Elasticsearch support you'll ever need

Free Trial

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.