Brief Explanation
This error occurs when a terms aggregation in Elasticsearch returns incomplete results. It happens when the number of unique terms exceeds the specified shard size, causing some terms to be omitted from the final result.
Common Causes
- Default shard size is too small for the dataset
- High cardinality in the field being aggregated
- Uneven distribution of terms across shards
- Large number of documents in the index
Troubleshooting and Resolution Steps
Increase the shard size:
- Set a larger
size
parameter in the terms aggregation - Increase the
shard_size
parameter (default issize * 1.5
)
{ "aggs": { "my_terms": { "terms": { "field": "my_field", "size": 1000, "shard_size": 2000 } } } }
- Set a larger
Use the
show_term_doc_count_error
parameter to identify terms with potential errors:{ "aggs": { "my_terms": { "terms": { "field": "my_field", "show_term_doc_count_error": true } } } }
Consider using the
composite
aggregation for high-cardinality fieldsOptimize your index structure and mapping for better term distribution
If accuracy is crucial, use the
terminate_after
parameter to ensure all documents are processed
Additional Information and Best Practices
- Monitor the
_shards.failed
value in the response to check for shard failures - Use the
min_doc_count
parameter to filter out rare terms - Consider using approximate aggregations like
cardinality
for high-cardinality fields - Regularly review and optimize your aggregation queries for performance
Q&A Section
Q: What is the default shard size for terms aggregation? A: The default shard size is 1.5 times the
size
parameter of the terms aggregation.Q: Can partial results affect the accuracy of my analytics? A: Yes, partial results can lead to inaccurate analytics, especially when dealing with long-tail distributions or when precise counts are required.
Q: How can I determine if I'm getting partial results? A: Check the
sum_other_doc_count
in the aggregation response. A non-zero value indicates that some terms were omitted.Q: Is there a way to get exact results for high-cardinality fields? A: For exact results, you can use the
composite
aggregation with pagination, but this may impact performance for very large datasets.Q: How does increasing shard size affect query performance? A: Increasing shard size can improve accuracy but may also increase memory usage and query time. It's important to find the right balance for your use case.