Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Sizing Calculator

Proper sizing ensures your Elasticsearch cluster can handle your workload efficiently. This guide provides formulas and methods to calculate cluster requirements.

Key Sizing Factors

Input Variables

Variable Description
Daily data volume Raw data indexed per day
Retention period How long to keep data
Replication factor Number of replicas (usually 1)
Query load Searches per second
Indexing rate Documents per second

Output Requirements

Resource Calculation
Total storage (Daily volume × Retention × (1 + Replicas)) × 1.15
Shard count Total data / Target shard size
Node count Based on storage, CPU, memory needs
Heap size 50% of RAM, max 32 GB

Storage Calculations

Formula

Total Storage = Daily Data × Retention Days × (1 + Replicas) × 1.15 (overhead)

Example

Daily data: 100 GB
Retention: 30 days
Replicas: 1

Total Storage = 100 × 30 × 2 × 1.15 = 6,900 GB (~7 TB)

Storage Overhead Factors

Factor Multiplier Description
Replication 1 + replicas Data copies
Index overhead 1.1-1.2 Lucene structures
Translog 1.05 Write-ahead log
Headroom 1.2-1.3 Growth buffer

Shard Calculations

Optimal Shard Size

  • Target: 10-50 GB per shard
  • Search-focused: 10-30 GB
  • Write-heavy: 30-50 GB

Formula

Primary Shards = Total Primary Data / Target Shard Size
Total Shards = Primary Shards × (1 + Replicas)

Example

Total data: 3 TB primary
Target shard size: 40 GB

Primary shards = 3000 GB / 40 GB = 75 shards
With 1 replica = 150 total shards

Node Count Calculations

Based on Storage

Data Nodes = Total Storage / Storage per Node

Based on Shards

Data Nodes = Total Shards / Max Shards per Node

Where Max Shards per Node = ~1000 (soft limit)
Better target: 20-30 shards per GB of heap

Based on CPU

Data Nodes = Peak Search QPS / Searches per Node
Data Nodes = Peak Index Rate / Index Rate per Node

Typical Node Capacities

Node Type Typical Capacity
r5.xlarge (AWS) 500 GB storage, 100 QPS
r5.2xlarge 1 TB storage, 200 QPS
r5.4xlarge 2 TB storage, 400 QPS

Memory Calculations

Heap Sizing

Heap = min(RAM × 0.5, 31 GB)

Rule: Heap should be about half of RAM but never above 32 GB.

Memory Allocation

Component Allocation
JVM Heap 50% of RAM (max 32 GB)
Filesystem cache Remaining RAM
OS ~1-2 GB

Example

Server RAM: 64 GB
Heap: 31 GB
Filesystem cache: ~31 GB
OS: ~2 GB

Sizing Calculator

Quick Calculator Table

Daily Volume Retention Replicas Storage Needed Shards Min Nodes
10 GB 30 days 1 690 GB 18 2
50 GB 30 days 1 3.4 TB 86 4
100 GB 30 days 1 6.9 TB 173 6
500 GB 30 days 1 34.5 TB 862 15
1 TB 30 days 1 69 TB 1725 30

Calculator Script

def calculate_elasticsearch_sizing(
    daily_data_gb: float,
    retention_days: int,
    replicas: int = 1,
    target_shard_size_gb: float = 40,
    storage_per_node_gb: float = 1000,
    ram_per_node_gb: float = 64
):
    # Storage calculation
    overhead_factor = 1.15
    total_storage_gb = daily_data_gb * retention_days * (1 + replicas) * overhead_factor

    # Shard calculation
    primary_data_gb = daily_data_gb * retention_days * overhead_factor
    primary_shards = max(1, int(primary_data_gb / target_shard_size_gb))
    total_shards = primary_shards * (1 + replicas)

    # Node calculation (based on storage)
    min_nodes_storage = max(3, int(total_storage_gb / storage_per_node_gb) + 1)

    # Node calculation (based on shards)
    heap_gb = min(ram_per_node_gb * 0.5, 31)
    max_shards_per_node = heap_gb * 20  # Conservative estimate
    min_nodes_shards = max(3, int(total_shards / max_shards_per_node) + 1)

    # Take the higher of the two
    recommended_nodes = max(min_nodes_storage, min_nodes_shards)

    return {
        "total_storage_tb": round(total_storage_gb / 1000, 1),
        "primary_shards": primary_shards,
        "total_shards": total_shards,
        "recommended_data_nodes": recommended_nodes,
        "heap_per_node_gb": int(heap_gb),
        "storage_per_node_gb": int(total_storage_gb / recommended_nodes)
    }

# Example usage
result = calculate_elasticsearch_sizing(
    daily_data_gb=100,
    retention_days=30,
    replicas=1
)
print(result)

Workload-Specific Adjustments

Logging/Time-Series

  • Higher write throughput needed
  • Larger shards acceptable (40-50 GB)
  • Consider hot-warm architecture
  • Use ILM for data management

Full-Text Search

  • Lower write, higher read
  • Smaller shards (10-30 GB) for better search latency
  • More replicas for read scaling
  • Consider dedicated coordinating nodes

Analytics/Aggregations

  • Memory-intensive
  • Larger heap beneficial
  • Fewer, larger nodes often better
  • Consider dedicated data nodes for heavy aggregations

Master Node Sizing

Dedicated Masters Required When

  • Cluster > 5 data nodes
  • Production environment
  • High availability required

Master Node Requirements

Cluster Size Master Nodes Min RAM Min CPU
Small (3-5 data) 3 2 GB heap 2 cores
Medium (5-20 data) 3 4 GB heap 4 cores
Large (20-50 data) 3 8 GB heap 8 cores
Very Large (50+) 3 16 GB heap 16 cores

Validation Checklist

After sizing:

  • Total storage has 30%+ headroom
  • Heap per node ≤ 31 GB
  • Shards per node < 1000
  • Shard size between 10-50 GB
  • At least 3 master-eligible nodes
  • Replicas configured (usually 1)
  • Growth projections considered
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.