The Top Hits Aggregation in Elasticsearch allows you to retrieve the most relevant documents for each bucket in a parent aggregation. It's particularly useful when you want to fetch a sample of matching documents alongside your aggregation results.
Syntax
"top_hits": {
"size": 10,
"sort": [
{
"date": {
"order": "desc"
}
}
],
"_source": {
"includes": [ "field1", "field2" ]
}
}
For more details, refer to the official Elasticsearch documentation.
Example Usage
Here's an example that groups documents by category and retrieves the top 3 most recent documents for each:
GET /my-index/_search
{
"size": 0,
"aggs": {
"categories": {
"terms": {
"field": "category",
"size": 10
},
"aggs": {
"top_docs": {
"top_hits": {
"size": 3,
"sort": [
{
"date": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Common Issues
- Performance impact: Using top_hits with a large size can be resource-intensive.
- Sorting limitations: Certain types of sorting (e.g., script-based) may not be available in top_hits.
- Field data usage: Be cautious when using top_hits on fields with high cardinality, as it may consume significant memory.
Best Practices
- Limit the size of top_hits to reduce memory usage and improve performance.
- Use
_source
filtering to retrieve only necessary fields. - Consider using search_after for pagination instead of deep paging with top_hits.
- Combine top_hits with other aggregations for more complex analysis.
Frequently Asked Questions
Q: Can I use top_hits aggregation without a parent bucket aggregation?
A: No, top_hits is typically used as a sub-aggregation within a bucket aggregation like terms or date_histogram.
Q: How does top_hits affect the overall query performance?
A: Top_hits can impact performance, especially with large sizes. It's best to limit the size and use _source filtering to minimize the data transferred.
Q: Can I apply additional scoring or sorting to top_hits results?
A: Yes, you can apply custom sorting and even use script-based sorting within top_hits, similar to regular search queries.
Q: Is it possible to highlight fields in top_hits results?
A: Yes, you can use the highlight parameter within top_hits to highlight specific fields in the returned documents.
Q: How does top_hits interact with nested documents and aggregations?
A: Top_hits can be used effectively with nested aggregations to retrieve the most relevant nested documents within each bucket.