Elasticsearch Top Hits Aggregation - Syntax, Example, and Tips

Pulse - Elasticsearch Operations Done Right

On this page

Syntax Example Usage Common Issues Best Practices Frequently Asked Questions

The Top Hits Aggregation in Elasticsearch allows you to retrieve the most relevant documents for each bucket in a parent aggregation. It's particularly useful when you want to fetch a sample of matching documents alongside your aggregation results.

Syntax

"top_hits": {
  "size": 10,
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ],
  "_source": {
    "includes": [ "field1", "field2" ]
  }
}

For more details, refer to the official Elasticsearch documentation.

Example Usage

Here's an example that groups documents by category and retrieves the top 3 most recent documents for each:

GET /my-index/_search
{
  "size": 0,
  "aggs": {
    "categories": {
      "terms": {
        "field": "category",
        "size": 10
      },
      "aggs": {
        "top_docs": {
          "top_hits": {
            "size": 3,
            "sort": [
              {
                "date": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Common Issues

  1. Performance impact: Using top_hits with a large size can be resource-intensive.
  2. Sorting limitations: Certain types of sorting (e.g., script-based) may not be available in top_hits.
  3. Field data usage: Be cautious when using top_hits on fields with high cardinality, as it may consume significant memory.

Best Practices

  1. Limit the size of top_hits to reduce memory usage and improve performance.
  2. Use _source filtering to retrieve only necessary fields.
  3. Consider using search_after for pagination instead of deep paging with top_hits.
  4. Combine top_hits with other aggregations for more complex analysis.

Frequently Asked Questions

Q: Can I use top_hits aggregation without a parent bucket aggregation?
A: No, top_hits is typically used as a sub-aggregation within a bucket aggregation like terms or date_histogram.

Q: How does top_hits affect the overall query performance?
A: Top_hits can impact performance, especially with large sizes. It's best to limit the size and use _source filtering to minimize the data transferred.

Q: Can I apply additional scoring or sorting to top_hits results?
A: Yes, you can apply custom sorting and even use script-based sorting within top_hits, similar to regular search queries.

Q: Is it possible to highlight fields in top_hits results?
A: Yes, you can use the highlight parameter within top_hits to highlight specific fields in the returned documents.

Q: How does top_hits interact with nested documents and aggregations?
A: Top_hits can be used effectively with nested aggregations to retrieve the most relevant nested documents within each bucket.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.