Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Aggregation Performance Tuning

Aggregations in Elasticsearch can be resource-intensive, especially with large datasets. This guide provides techniques to optimize aggregation performance while maintaining accuracy.

Understanding Aggregation Costs

Resource Consumption

Aggregations consume:

  • Heap memory: Collecting and storing buckets
  • CPU: Computing statistics
  • Network: Transferring results between nodes

Aggregation Types by Cost

Type Memory Cost CPU Cost Notes
Value count Low Low Most efficient
Sum/Avg/Min/Max Low Low Simple statistics
Cardinality Medium Low HyperLogLog approximation
Terms High Medium Depends on cardinality
Date histogram Medium Medium Depends on interval
Nested High High Per nested document

Optimization Techniques

1. Reduce Bucket Count

Before (expensive):

{
  "aggs": {
    "categories": {
      "terms": {
        "field": "category.keyword",
        "size": 10000
      }
    }
  }
}

After (optimized):

{
  "aggs": {
    "categories": {
      "terms": {
        "field": "category.keyword",
        "size": 100
      }
    }
  }
}

2. Use shard_size Appropriately

{
  "aggs": {
    "categories": {
      "terms": {
        "field": "category.keyword",
        "size": 10,
        "shard_size": 50  // Collect more per shard for accuracy
      }
    }
  }
}

Rule of thumb: shard_size = size * 1.5 + 10

3. Use Composite Aggregation for High Cardinality

Instead of large terms aggregation:

{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          {"category": {"terms": {"field": "category.keyword"}}},
          {"region": {"terms": {"field": "region.keyword"}}}
        ]
      }
    }
  }
}

// Paginate with after
{
  "size": 0,
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 1000,
        "after": {"category": "last_category", "region": "last_region"},
        "sources": [...]
      }
    }
  }
}

4. Filter Before Aggregating

{
  "query": {
    "bool": {
      "filter": [
        {"range": {"date": {"gte": "now-7d"}}},
        {"term": {"status": "active"}}
      ]
    }
  },
  "aggs": {
    "categories": {
      "terms": {"field": "category.keyword", "size": 10}
    }
  }
}

5. Use Sampler for Large Datasets

{
  "aggs": {
    "sample": {
      "sampler": {
        "shard_size": 1000
      },
      "aggs": {
        "categories": {
          "terms": {"field": "category.keyword"}
        }
      }
    }
  }
}

6. Avoid Aggregations on Text Fields

Problem (uses fielddata):

{
  "aggs": {
    "keywords": {
      "terms": {"field": "description"}  // Text field!
    }
  }
}

Solution (use keyword):

{
  "aggs": {
    "keywords": {
      "terms": {"field": "description.keyword"}
    }
  }
}

7. Use execution_hint for Terms

{
  "aggs": {
    "categories": {
      "terms": {
        "field": "category.keyword",
        "execution_hint": "map"  // For low cardinality
        // or "global_ordinals" (default) for high cardinality
      }
    }
  }
}

8. Minimize Nested Aggregations

Expensive:

{
  "aggs": {
    "level1": {
      "terms": {"field": "a"},
      "aggs": {
        "level2": {
          "terms": {"field": "b"},
          "aggs": {
            "level3": {
              "terms": {"field": "c"}
            }
          }
        }
      }
    }
  }
}

Better: Flatten where possible or limit depth.

9. Use Filters Aggregation Efficiently

{
  "aggs": {
    "categories": {
      "filters": {
        "filters": {
          "active": {"term": {"status": "active"}},
          "pending": {"term": {"status": "pending"}}
        }
      }
    }
  }
}

10. Pre-Aggregate with Transforms

For repeated aggregations, use transforms:

PUT _transform/daily_sales
{
  "source": {"index": "sales"},
  "dest": {"index": "daily_sales_summary"},
  "pivot": {
    "group_by": {
      "date": {"date_histogram": {"field": "timestamp", "calendar_interval": "day"}},
      "product": {"terms": {"field": "product_id"}}
    },
    "aggregations": {
      "total_sales": {"sum": {"field": "amount"}},
      "count": {"value_count": {"field": "_id"}}
    }
  }
}

Memory Management

Configure Circuit Breakers

PUT /_cluster/settings
{
  "persistent": {
    "indices.breaker.request.limit": "40%"
  }
}

Monitor Aggregation Memory

GET /_nodes/stats/indices/fielddata
GET /_nodes/stats/breaker

Clear Fielddata Cache

POST /_cache/clear?fielddata=true

Cardinality Optimization

Use Approximate Cardinality

{
  "aggs": {
    "unique_users": {
      "cardinality": {
        "field": "user_id",
        "precision_threshold": 1000  // Trade accuracy for speed
      }
    }
  }
}

Pre-Compute Cardinality

Index a hash field for faster cardinality:

PUT /my_index/_mapping
{
  "properties": {
    "user_id_hash": {
      "type": "murmur3"
    }
  }
}

Date Histogram Optimization

Use Fixed Intervals When Possible

{
  "aggs": {
    "over_time": {
      "date_histogram": {
        "field": "timestamp",
        "fixed_interval": "1h"  // More efficient than calendar_interval
      }
    }
  }
}

Use Auto Date Histogram

{
  "aggs": {
    "over_time": {
      "auto_date_histogram": {
        "field": "timestamp",
        "buckets": 10  // System chooses optimal interval
      }
    }
  }
}

Performance Testing

Profile Aggregations

{
  "profile": true,
  "aggs": {
    "my_agg": {...}
  }
}

Benchmark Changes

# Before optimization
time curl -s "localhost:9200/my-index/_search" -d'{"aggs":{...}}'

# After optimization
time curl -s "localhost:9200/my-index/_search" -d'{"aggs":{...}}'

Aggregation Checklist

  • Bucket count minimized (size parameter)
  • Using filter context to reduce document scope
  • Keyword fields used (not text)
  • Composite aggregation for high-cardinality pagination
  • Sampler used for large datasets when approximate is acceptable
  • Nested aggregation depth minimized
  • Transform used for repeated aggregations
  • Circuit breakers configured appropriately
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.