Elasticsearch Geo Distance Aggregation - Syntax, Example, and Tips

Pulse - Elasticsearch Operations Done Right

On this page

Syntax Example Usage Common Issues Best Practices Frequently Asked Questions

The Geo Distance Aggregation in Elasticsearch is used to group documents based on their distance from a central point. This aggregation is particularly useful for location-based queries and analysis, allowing you to create distance-based buckets for geospatial data.

Syntax

The basic syntax for a Geo Distance Aggregation is as follows:

{
  "aggs": {
    "distance_ranges": {
      "geo_distance": {
        "field": "location",
        "origin": "52.3760, 4.894",
        "unit": "km",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 300 },
          { "from": 300 }
        ]
      }
    }
  }
}

For more details, refer to the official Elasticsearch documentation.

Example Usage

Here's an example of how you might use the Geo Distance Aggregation to analyze customer data based on their distance from a store location:

GET /customers/_search
{
  "size": 0,
  "aggs": {
    "customers_by_distance": {
      "geo_distance": {
        "field": "location",
        "origin": "40.7128, -74.0060",
        "unit": "mi",
        "ranges": [
          { "to": 5 },
          { "from": 5, "to": 15 },
          { "from": 15, "to": 50 },
          { "from": 50 }
        ]
      }
    }
  }
}

This query will group customers into buckets based on their distance from the specified coordinates (New York City in this case).

Common Issues

  1. Ensure that the field specified is of type geo_point.
  2. Be mindful of the unit used in the aggregation (e.g., km, mi, m).
  3. The origin can be specified in various formats (lat/lon as string, array, or object).
  4. Large datasets might require optimizations to improve performance.

Best Practices

  1. Use appropriate precision for your use case to balance accuracy and performance.
  2. Consider combining with other aggregations for more complex analyses.
  3. For large datasets, consider using geo-grid aggregations for better performance.
  4. Always validate input coordinates to ensure they are within valid ranges.

Frequently Asked Questions

Q: Can I use Geo Distance Aggregation with multiple origin points?
A: No, the Geo Distance Aggregation works with a single origin point. For multiple points, you would need to perform separate aggregations or consider using a geo-shape query instead.

Q: How does the Geo Distance Aggregation handle documents without location data?
A: Documents without a valid geo-point in the specified field will be ignored in the aggregation results.

Q: Is there a limit to the number of ranges I can specify?
A: While there's no hard limit, using too many ranges can impact performance. It's generally recommended to use a reasonable number of ranges that make sense for your use case.

Q: Can I use Geo Distance Aggregation with geo-shapes instead of geo-points?
A: No, the Geo Distance Aggregation only works with geo-point fields. For geo-shapes, you might need to consider other types of geo aggregations or queries.

Q: How accurate is the distance calculation in Geo Distance Aggregation?
A: Elasticsearch uses the Haversine formula for distance calculations, which assumes a spherical Earth. While this is generally accurate for most use cases, it may have slight inaccuracies for very large distances or near the poles.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.