Brief Explanation
This error occurs when a terms aggregation in Elasticsearch returns incomplete results. It happens when the number of unique terms exceeds the specified shard size, causing some terms to be omitted from the final result.
Common Causes
- Default shard size is too small for the dataset
- High cardinality in the field being aggregated
- Uneven distribution of terms across shards
- Large number of documents in the index
Troubleshooting and Resolution Steps
Increase the shard size:
- Set a larger
sizeparameter in the terms aggregation - Increase the
shard_sizeparameter (default issize * 1.5)
{ "aggs": { "my_terms": { "terms": { "field": "my_field", "size": 1000, "shard_size": 2000 } } } }- Set a larger
Use the
show_term_doc_count_errorparameter to identify terms with potential errors:{ "aggs": { "my_terms": { "terms": { "field": "my_field", "show_term_doc_count_error": true } } } }Consider using the
compositeaggregation for high-cardinality fieldsOptimize your index structure and mapping for better term distribution
If accuracy is crucial, use the
terminate_afterparameter to ensure all documents are processed
Additional Information and Best Practices
- Monitor the
_shards.failedvalue in the response to check for shard failures - Use the
min_doc_countparameter to filter out rare terms - Consider using approximate aggregations like
cardinalityfor high-cardinality fields - Regularly review and optimize your aggregation queries for performance
Q&A Section
Q: What is the default shard size for terms aggregation? A: The default shard size is 1.5 times the
sizeparameter of the terms aggregation.Q: Can partial results affect the accuracy of my analytics? A: Yes, partial results can lead to inaccurate analytics, especially when dealing with long-tail distributions or when precise counts are required.
Q: How can I determine if I'm getting partial results? A: Check the
sum_other_doc_countin the aggregation response. A non-zero value indicates that some terms were omitted.Q: Is there a way to get exact results for high-cardinality fields? A: For exact results, you can use the
compositeaggregation with pagination, but this may impact performance for very large datasets.Q: How does increasing shard size affect query performance? A: Increasing shard size can improve accuracy but may also increase memory usage and query time. It's important to find the right balance for your use case.