Elasticsearch Error: Terms aggregation returning partial results

Brief Explanation

This error occurs when a terms aggregation in Elasticsearch returns incomplete results. It happens when the number of unique terms exceeds the specified shard size, causing some terms to be omitted from the final result.

Common Causes

  1. Default shard size is too small for the dataset
  2. High cardinality in the field being aggregated
  3. Uneven distribution of terms across shards
  4. Large number of documents in the index

Troubleshooting and Resolution Steps

  1. Increase the shard size:

    • Set a larger size parameter in the terms aggregation
    • Increase the shard_size parameter (default is size * 1.5)
    {
      "aggs": {
        "my_terms": {
          "terms": {
            "field": "my_field",
            "size": 1000,
            "shard_size": 2000
          }
        }
      }
    }
    
  2. Use the show_term_doc_count_error parameter to identify terms with potential errors:

    {
      "aggs": {
        "my_terms": {
          "terms": {
            "field": "my_field",
            "show_term_doc_count_error": true
          }
        }
      }
    }
    
  3. Consider using the composite aggregation for high-cardinality fields

  4. Optimize your index structure and mapping for better term distribution

  5. If accuracy is crucial, use the terminate_after parameter to ensure all documents are processed

Additional Information and Best Practices

  • Monitor the _shards.failed value in the response to check for shard failures
  • Use the min_doc_count parameter to filter out rare terms
  • Consider using approximate aggregations like cardinality for high-cardinality fields
  • Regularly review and optimize your aggregation queries for performance

Q&A Section

  1. Q: What is the default shard size for terms aggregation? A: The default shard size is 1.5 times the size parameter of the terms aggregation.

  2. Q: Can partial results affect the accuracy of my analytics? A: Yes, partial results can lead to inaccurate analytics, especially when dealing with long-tail distributions or when precise counts are required.

  3. Q: How can I determine if I'm getting partial results? A: Check the sum_other_doc_count in the aggregation response. A non-zero value indicates that some terms were omitted.

  4. Q: Is there a way to get exact results for high-cardinality fields? A: For exact results, you can use the composite aggregation with pagination, but this may impact performance for very large datasets.

  5. Q: How does increasing shard size affect query performance? A: Increasing shard size can improve accuracy but may also increase memory usage and query time. It's important to find the right balance for your use case.

Pulse - Elasticsearch Operations Done Right

Stop googling errors and staring at dashboards.

Free Trial

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.