Elasticsearch Derivative Aggregation - Syntax, Example, and Tips

Pulse - Elasticsearch Operations Done Right

On this page

Syntax Example Usage Common Issues Best Practices Frequently Asked Questions

The Derivative Aggregation in Elasticsearch calculates the rate of change between consecutive data points in a time series. It's particularly useful for analyzing trends, identifying anomalies, and understanding the velocity of change in metrics over time.

Syntax

{
  "derivative": {
    "buckets_path": "the_sum"
  }
}

For more details, refer to the official Elasticsearch documentation on Derivative Aggregation.

Example Usage

Here's an example that calculates the rate of change in daily sales:

{
  "aggs": {
    "sales_per_day": {
      "date_histogram": {
        "field": "date",
        "calendar_interval": "day"
      },
      "aggs": {
        "daily_sales": {
          "sum": { "field": "sales" }
        }
      }
    },
    "sales_derivative": {
      "derivative": {
        "buckets_path": "sales_per_day>daily_sales"
      }
    }
  }
}

Common Issues

  1. Missing data points: The derivative aggregation requires consecutive data points. Missing data can lead to inaccurate results.
  2. Unit mismatch: Ensure that the units of your time intervals match the units of your metric for meaningful results.
  3. Outliers: Extreme values can significantly skew derivative calculations.

Best Practices

  1. Use with date_histogram for time-based analysis.
  2. Consider using gap_policy to handle missing data points.
  3. Normalize your data or use normalize parameter for comparing different scales.
  4. Combine with moving averages to smooth out short-term fluctuations.

Frequently Asked Questions

Q: How does the Derivative Aggregation handle gaps in data?
A: By default, it skips gaps. You can use the gap_policy parameter to specify how to handle missing data points, such as inserting zeros or carrying forward the last known value.

Q: Can Derivative Aggregation be used on non-numeric fields?
A: No, Derivative Aggregation is designed for numeric fields only. It calculates the difference between numeric values in consecutive buckets.

Q: How is the derivative calculated for the first data point?
A: The derivative for the first data point is always null because there's no previous point to calculate the difference from.

Q: Can I use Derivative Aggregation with non-time-based data?
A: While it's most commonly used with time-based data, you can use it with any ordered sequence of numeric values. However, interpretation may be less intuitive for non-time-based data.

Q: How can I calculate percentage change instead of absolute change?
A: You can use the normalize parameter in combination with Derivative Aggregation to calculate percentage change. Set normalize to the desired interval (e.g., 1 for percentage).

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.