Elasticsearch Top Hits Aggregation - Syntax, Example, and Tips

The Top Hits Aggregation in Elasticsearch allows you to retrieve the most relevant documents for each bucket in a parent aggregation. It's particularly useful when you want to fetch a sample of matching documents alongside your aggregation results.

Syntax

"top_hits": {
  "size": 10,
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ],
  "_source": {
    "includes": [ "field1", "field2" ]
  }
}

For more details, refer to the official Elasticsearch documentation.

Example Usage

Here's an example that groups documents by category and retrieves the top 3 most recent documents for each:

GET /my-index/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "top_docs": {
          "top_hits": {
            "size": 3,
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Common Issues

Performance impact: Using top_hits with a large size can be resource-intensive.
Sorting limitations: Certain types of sorting (e.g., script-based) may not be available in top_hits.
Field data usage: Be cautious when using top_hits on fields with high cardinality, as it may consume significant memory.

Best Practices

Limit the size of top_hits to reduce memory usage and improve performance.
Use _source filtering to retrieve only necessary fields.
Consider using search_after for pagination instead of deep paging with top_hits.
Combine top_hits with other aggregations for more complex analysis.

Frequently Asked Questions

Q: Can I use top_hits aggregation without a parent bucket aggregation?
A: No, top_hits is typically used as a sub-aggregation within a bucket aggregation like terms or date_histogram.

Q: How does top_hits affect the overall query performance?
A: Top_hits can impact performance, especially with large sizes. It's best to limit the size and use _source filtering to minimize the data transferred.

Q: Can I apply additional scoring or sorting to top_hits results?
A: Yes, you can apply custom sorting and even use script-based sorting within top_hits, similar to regular search queries.

Q: Is it possible to highlight fields in top_hits results?
A: Yes, you can use the highlight parameter within top_hits to highlight specific fields in the returned documents.

Q: How does top_hits interact with nested documents and aggregations?
A: Top_hits can be used effectively with nested aggregations to retrieve the most relevant nested documents within each bucket.