Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Capacity Planning Guide

Effective capacity planning ensures your Elasticsearch cluster meets performance requirements while optimizing costs. This guide covers the complete capacity planning process.

Capacity Planning Process

1. Gather Requirements

Data Requirements:

  • Current data volume
  • Data growth rate
  • Retention requirements
  • Data model complexity

Performance Requirements:

  • Search latency targets (p95, p99)
  • Indexing throughput
  • Query concurrency
  • Availability requirements (SLA)

Operational Requirements:

  • Backup/restore time
  • Maintenance windows
  • Disaster recovery

2. Analyze Workload

Indexing Profile:

GET /_nodes/stats/indices/indexing
  • Documents per second
  • Bulk request patterns
  • Peak vs. average rates

Search Profile:

GET /_nodes/stats/indices/search
  • Queries per second
  • Query complexity
  • Response size

3. Calculate Resources

See Elasticsearch Sizing Calculator for detailed formulas.

Resource Planning

Storage Planning

Calculate Total Storage

Total Storage = Daily Volume × Retention × (1 + Replicas) × Overhead

Overhead factors:

  • Lucene overhead: 10-20%
  • Translog: 5%
  • Headroom: 20-30%

Storage Type Selection

Storage Type Use Case Cost
NVMe SSD Hot tier, high-performance $$$
SSD Warm tier, general use $$
HDD Cold tier, archival $

Memory Planning

Heap Sizing

Rule: Heap should be about half of RAM but never above 32 GB.

Heap = min(Available RAM × 0.5, 31 GB)

Filesystem Cache

Filesystem Cache = Available RAM - Heap - OS Overhead

Target: At least equal to hot data size for optimal performance.

CPU Planning

Factors Affecting CPU

  • Query complexity
  • Aggregation depth
  • Indexing rate
  • Background operations (merges, recovery)

Estimation

CPU Cores = (Search QPS × Query CPU Cost) + (Index Rate × Index CPU Cost)

Typical starting point: 8-16 cores per data node.

Network Planning

Bandwidth Requirements

  • Inter-node traffic: Recovery, replication
  • Client traffic: Queries, bulk indexing
  • Monitoring: Metrics, logs

Latency Requirements

  • < 1ms between nodes (same zone)
  • < 10ms for cross-zone
  • Dedicated network for cluster traffic

Cluster Architecture

Small Cluster (< 100 GB/day)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Node 1    │  │   Node 2    │  │   Node 3    │
│ Master+Data │  │ Master+Data │  │ Master+Data │
│   32GB RAM  │  │   32GB RAM  │  │   32GB RAM  │
│   1TB SSD   │  │   1TB SSD   │  │   1TB SSD   │
└─────────────┘  └─────────────┘  └─────────────┘

Medium Cluster (100 GB - 1 TB/day)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Master 1   │  │  Master 2   │  │  Master 3   │
│   8GB heap  │  │   8GB heap  │  │   8GB heap  │
└─────────────┘  └─────────────┘  └─────────────┘

┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Data 1    │  │   Data 2    │  │   Data 3    │  │   Data N    │
│  64GB RAM   │  │  64GB RAM   │  │  64GB RAM   │  │  64GB RAM   │
│   4TB SSD   │  │   4TB SSD   │  │   4TB SSD   │  │   4TB SSD   │
└─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘

Large Cluster (> 1 TB/day)

                    ┌──────────────┐
                    │ Coordinating │ × 2-4
                    └──────────────┘

┌─────────┐  ┌─────────┐  ┌─────────┐
│ Master  │  │ Master  │  │ Master  │
└─────────┘  └─────────┘  └─────────┘

┌─────────┐  ┌─────────┐         ┌─────────┐  ┌─────────┐
│Hot Data │  │Hot Data │   ...   │Cold Data│  │Cold Data│
│  NVMe   │  │  NVMe   │         │   HDD   │  │   HDD   │
└─────────┘  └─────────┘         └─────────┘  └─────────┘
     Hot Tier (recent data)          Cold Tier (historical)

Growth Planning

Monitoring Growth

Track these metrics over time:

  • Total document count
  • Index size
  • Daily indexing rate
  • Query rate

Capacity Thresholds

Set alerts at:

  • 70%: Plan expansion
  • 80%: Execute expansion
  • 85%: Urgent action

Expansion Strategies

Vertical Scaling:

  • Increase node resources
  • Better for small increases
  • Limited by hardware

Horizontal Scaling:

  • Add more nodes
  • Better for large increases
  • Requires rebalancing

Cost Optimization

Right-Sizing

  • Start conservative, scale up based on data
  • Monitor actual usage vs. provisioned
  • Remove over-provisioned resources

Tiered Storage

Implement hot-warm-cold architecture:

PUT _ilm/policy/tiered_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {"max_primary_shard_size": "50gb"}
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {"number_of_shards": 1},
          "allocate": {
            "require": {"data": "warm"}
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {"data": "cold"}
          }
        }
      }
    }
  }
}

Reserved Instances

For cloud deployments:

  • Use reserved instances for base capacity
  • On-demand for peak/burst
  • Spot instances for non-critical workloads

Capacity Planning Checklist

Initial Planning

  • Documented data volume and growth rate
  • Defined retention requirements
  • Established performance SLAs
  • Calculated storage needs
  • Determined node count and specs
  • Planned cluster architecture

Ongoing Management

  • Monitoring dashboards configured
  • Capacity alerts set
  • Growth projections updated quarterly
  • Cost analysis performed regularly
  • Performance baselines established
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.