Elasticsearch Moving Function Aggregation - Syntax, Example, and Tips

The Moving Function Aggregation in Elasticsearch is a powerful tool for performing calculations on a sliding window of data, typically used in time series analysis. It allows you to apply a custom script to a specified number of surrounding data points, enabling complex calculations like moving averages, cumulative sums, or any custom logic you define.

Syntax and Documentation

The basic syntax for a Moving Function Aggregation is:

{
  "moving_fn": {
    "buckets_path": "the_sum",
    "window": 10,
    "script": "MovingFunctions.sum(values)"
  }
}

For detailed information and advanced usage, refer to the official Elasticsearch documentation on Moving Function Aggregation.

Example Usage

Here's an example of using the Moving Function Aggregation to calculate a 7-day moving average of daily sales:

{
  "aggs": {
    "sales_per_day": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_sales": {
          "sum": { "field": "sales" }
        }
      }
    },
    "weekly_moving_avg": {
      "moving_fn": {
        "buckets_path": "sales_per_day>daily_sales",
        "window": 7,
        "script": "MovingFunctions.unweightedAvg(values)"
      }
    }
  }
}

Common Issues

Insufficient data: Ensure you have enough data points to cover the specified window size.
Script errors: Double-check your custom scripts for syntax errors or logical issues.
Performance impact: Large window sizes can affect query performance, especially on large datasets.

Best Practices

Start with built-in functions like MovingFunctions.sum() or MovingFunctions.unweightedAvg() before writing custom scripts.
Use appropriate window sizes based on your data and analysis requirements.
Consider using the shift parameter to offset the window if you need to look at past or future data points.

Frequently Asked Questions

Q: Can I use Moving Function Aggregation with non-numeric data?
A: Moving Function Aggregation is primarily designed for numeric data. While you can write custom scripts to handle non-numeric data, it's generally more efficient and practical to use it with numeric values.

Q: How does the window size affect the results?
A: The window size determines how many data points are considered in each calculation. A larger window will smooth out short-term fluctuations but may be less responsive to recent changes. A smaller window will be more sensitive to recent data but may be more volatile.

Q: Can I combine Moving Function Aggregation with other aggregations?
A: Yes, Moving Function Aggregation can be combined with other aggregations. It's often used in conjunction with date histograms or other bucket aggregations to analyze time-series data.

Q: What's the difference between Moving Function and Moving Average aggregations?
A: Moving Average is a specific type of moving function that calculates the average of a window of data points. Moving Function is more flexible, allowing you to apply any custom calculation to the window of data.

Q: How can I handle missing data points in my Moving Function Aggregation?
A: You can use the gap_policy parameter to specify how to handle gaps in your data. Options include "skip" (ignore missing data), "insert_zeros" (treat missing data as zero), or "keep_values" (use the last known value).