Elasticsearch Capacity Planning Guide

Effective capacity planning ensures your Elasticsearch cluster meets performance requirements while optimizing costs. This guide covers the complete capacity planning process.

Capacity Planning Process

1. Gather Requirements

Data Requirements:

Current data volume
Data growth rate
Retention requirements
Data model complexity

Performance Requirements:

Search latency targets (p95, p99)
Indexing throughput
Query concurrency
Availability requirements (SLA)

Operational Requirements:

Backup/restore time
Maintenance windows
Disaster recovery

2. Analyze Workload

Indexing Profile:

GET /_nodes/stats/indices/indexing

Documents per second
Bulk request patterns
Peak vs. average rates

Search Profile:

GET /_nodes/stats/indices/search

Queries per second
Query complexity
Response size

3. Calculate Resources

See Elasticsearch Sizing Calculator for detailed formulas.

Resource Planning

Storage Planning

Calculate Total Storage

Total Storage = Daily Volume × Retention × (1 + Replicas) × Overhead

Overhead factors:

Lucene overhead: 10-20%
Translog: 5%
Headroom: 20-30%

Storage Type Selection

Storage Type	Use Case	Cost
NVMe SSD	Hot tier, high-performance	$$$
SSD	Warm tier, general use	$$
HDD	Cold tier, archival	$

Memory Planning

Heap Sizing

Rule: Heap should be about half of RAM but never above 32 GB.

Heap = min(Available RAM × 0.5, 31 GB)

Filesystem Cache

Filesystem Cache = Available RAM - Heap - OS Overhead

Target: At least equal to hot data size for optimal performance.

CPU Planning

Factors Affecting CPU

Query complexity
Aggregation depth
Indexing rate
Background operations (merges, recovery)

Estimation

CPU Cores = (Search QPS × Query CPU Cost) + (Index Rate × Index CPU Cost)

Typical starting point: 8-16 cores per data node.

Network Planning

Bandwidth Requirements

Inter-node traffic: Recovery, replication
Client traffic: Queries, bulk indexing
Monitoring: Metrics, logs

Latency Requirements

< 1ms between nodes (same zone)
< 10ms for cross-zone
Dedicated network for cluster traffic

Cluster Architecture

Small Cluster (< 100 GB/day)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Node 1    │  │   Node 2    │  │   Node 3    │
│ Master+Data │  │ Master+Data │  │ Master+Data │
│   32GB RAM  │  │   32GB RAM  │  │   32GB RAM  │
│   1TB SSD   │  │   1TB SSD   │  │   1TB SSD   │
└─────────────┘  └─────────────┘  └─────────────┘

Medium Cluster (100 GB - 1 TB/day)

┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  Master 1   │  │  Master 2   │  │  Master 3   │
│   8GB heap  │  │   8GB heap  │  │   8GB heap  │
└─────────────┘  └─────────────┘  └─────────────┘

┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│   Data 1    │  │   Data 2    │  │   Data 3    │  │   Data N    │
│  64GB RAM   │  │  64GB RAM   │  │  64GB RAM   │  │  64GB RAM   │
│   4TB SSD   │  │   4TB SSD   │  │   4TB SSD   │  │   4TB SSD   │
└─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘

Large Cluster (> 1 TB/day)

                    ┌──────────────┐
                    │ Coordinating │ × 2-4
                    └──────────────┘

┌─────────┐  ┌─────────┐  ┌─────────┐
│ Master  │  │ Master  │  │ Master  │
└─────────┘  └─────────┘  └─────────┘

┌─────────┐  ┌─────────┐         ┌─────────┐  ┌─────────┐
│Hot Data │  │Hot Data │   ...   │Cold Data│  │Cold Data│
│  NVMe   │  │  NVMe   │         │   HDD   │  │   HDD   │
└─────────┘  └─────────┘         └─────────┘  └─────────┘
     Hot Tier (recent data)          Cold Tier (historical)

Growth Planning

Monitoring Growth

Track these metrics over time:

Total document count
Index size
Daily indexing rate
Query rate

Capacity Thresholds

Set alerts at:

70%: Plan expansion
80%: Execute expansion
85%: Urgent action

Expansion Strategies

Vertical Scaling:

Increase node resources
Better for small increases
Limited by hardware

Horizontal Scaling:

Add more nodes
Better for large increases
Requires rebalancing

Cost Optimization

Right-Sizing

Start conservative, scale up based on data
Monitor actual usage vs. provisioned
Remove over-provisioned resources

Tiered Storage

Implement hot-warm-cold architecture:

PUT _ilm/policy/tiered_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {"max_primary_shard_size": "50gb"}
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "shrink": {"number_of_shards": 1},
          "allocate": {
            "require": {"data": "warm"}
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {"data": "cold"}
          }
        }
      }
    }
  }
}

Reserved Instances

For cloud deployments:

Use reserved instances for base capacity
On-demand for peak/burst
Spot instances for non-critical workloads

Capacity Planning Checklist

Initial Planning

Documented data volume and growth rate
Defined retention requirements
Established performance SLAs
Calculated storage needs
Determined node count and specs
Planned cluster architecture

Ongoing Management

Monitoring dashboards configured
Capacity alerts set
Growth projections updated quarterly
Cost analysis performed regularly
Performance baselines established