Elasticsearch Sizing Calculator

Proper sizing ensures your Elasticsearch cluster can handle your workload efficiently. This guide provides formulas and methods to calculate cluster requirements.

Key Sizing Factors

Input Variables

Variable	Description
Daily data volume	Raw data indexed per day
Retention period	How long to keep data
Replication factor	Number of replicas (usually 1)
Query load	Searches per second
Indexing rate	Documents per second

Output Requirements

Resource	Calculation
Total storage	(Daily volume × Retention × (1 + Replicas)) × 1.15
Shard count	Total data / Target shard size
Node count	Based on storage, CPU, memory needs
Heap size	50% of RAM, max 32 GB

Storage Calculations

Formula

Total Storage = Daily Data × Retention Days × (1 + Replicas) × 1.15 (overhead)

Example

Daily data: 100 GB
Retention: 30 days
Replicas: 1

Total Storage = 100 × 30 × 2 × 1.15 = 6,900 GB (~7 TB)

Storage Overhead Factors

Factor	Multiplier	Description
Replication	1 + replicas	Data copies
Index overhead	1.1-1.2	Lucene structures
Translog	1.05	Write-ahead log
Headroom	1.2-1.3	Growth buffer

Shard Calculations

Optimal Shard Size

Target: 10-50 GB per shard
Search-focused: 10-30 GB
Write-heavy: 30-50 GB

Formula

Primary Shards = Total Primary Data / Target Shard Size
Total Shards = Primary Shards × (1 + Replicas)

Example

Total data: 3 TB primary
Target shard size: 40 GB

Primary shards = 3000 GB / 40 GB = 75 shards
With 1 replica = 150 total shards

Node Count Calculations

Based on Storage

Data Nodes = Total Storage / Storage per Node

Based on Shards

Data Nodes = Total Shards / Max Shards per Node

Where Max Shards per Node = ~1000 (soft limit)
Better target: 20-30 shards per GB of heap

Based on CPU

Data Nodes = Peak Search QPS / Searches per Node
Data Nodes = Peak Index Rate / Index Rate per Node

Typical Node Capacities

Node Type	Typical Capacity
r5.xlarge (AWS)	500 GB storage, 100 QPS
r5.2xlarge	1 TB storage, 200 QPS
r5.4xlarge	2 TB storage, 400 QPS

Memory Calculations

Heap Sizing

Heap = min(RAM × 0.5, 31 GB)

Rule: Heap should be about half of RAM but never above 32 GB.

Memory Allocation

Component	Allocation
JVM Heap	50% of RAM (max 32 GB)
Filesystem cache	Remaining RAM
OS	~1-2 GB

Example

Server RAM: 64 GB
Heap: 31 GB
Filesystem cache: ~31 GB
OS: ~2 GB

Sizing Calculator

Quick Calculator Table

Daily Volume	Retention	Replicas	Storage Needed	Shards	Min Nodes
10 GB	30 days	1	690 GB	18	2
50 GB	30 days	1	3.4 TB	86	4
100 GB	30 days	1	6.9 TB	173	6
500 GB	30 days	1	34.5 TB	862	15
1 TB	30 days	1	69 TB	1725	30

Calculator Script

def calculate_elasticsearch_sizing(
    daily_data_gb: float,
    retention_days: int,
    replicas: int = 1,
    target_shard_size_gb: float = 40,
    storage_per_node_gb: float = 1000,
    ram_per_node_gb: float = 64
):
    # Storage calculation
    overhead_factor = 1.15
    total_storage_gb = daily_data_gb * retention_days * (1 + replicas) * overhead_factor

    # Shard calculation
    primary_data_gb = daily_data_gb * retention_days * overhead_factor
    primary_shards = max(1, int(primary_data_gb / target_shard_size_gb))
    total_shards = primary_shards * (1 + replicas)

    # Node calculation (based on storage)
    min_nodes_storage = max(3, int(total_storage_gb / storage_per_node_gb) + 1)

    # Node calculation (based on shards)
    heap_gb = min(ram_per_node_gb * 0.5, 31)
    max_shards_per_node = heap_gb * 20  # Conservative estimate
    min_nodes_shards = max(3, int(total_shards / max_shards_per_node) + 1)

    # Take the higher of the two
    recommended_nodes = max(min_nodes_storage, min_nodes_shards)

    return {
        "total_storage_tb": round(total_storage_gb / 1000, 1),
        "primary_shards": primary_shards,
        "total_shards": total_shards,
        "recommended_data_nodes": recommended_nodes,
        "heap_per_node_gb": int(heap_gb),
        "storage_per_node_gb": int(total_storage_gb / recommended_nodes)
    }

# Example usage
result = calculate_elasticsearch_sizing(
    daily_data_gb=100,
    retention_days=30,
    replicas=1
)
print(result)

Workload-Specific Adjustments

Logging/Time-Series

Higher write throughput needed
Larger shards acceptable (40-50 GB)
Consider hot-warm architecture
Use ILM for data management

Full-Text Search

Lower write, higher read
Smaller shards (10-30 GB) for better search latency
More replicas for read scaling
Consider dedicated coordinating nodes

Analytics/Aggregations

Memory-intensive
Larger heap beneficial
Fewer, larger nodes often better
Consider dedicated data nodes for heavy aggregations

Master Node Sizing

Dedicated Masters Required When

Cluster > 5 data nodes
Production environment
High availability required

Master Node Requirements

Cluster Size	Master Nodes	Min RAM	Min CPU
Small (3-5 data)	3	2 GB heap	2 cores
Medium (5-20 data)	3	4 GB heap	4 cores
Large (20-50 data)	3	8 GB heap	8 cores
Very Large (50+)	3	16 GB heap	16 cores

Validation Checklist

After sizing:

Total storage has 30%+ headroom
Heap per node ≤ 31 GB
Shards per node < 1000
Shard size between 10-50 GB
At least 3 master-eligible nodes
Replicas configured (usually 1)
Growth projections considered

Elasticsearch Sizing Calculator

Key Sizing Factors

Input Variables

Output Requirements

Storage Calculations

Formula

Example

Storage Overhead Factors

Shard Calculations

Optimal Shard Size

Formula

Example

Node Count Calculations

Based on Storage

Based on Shards

Based on CPU

Typical Node Capacities

Memory Calculations

Heap Sizing

Memory Allocation

Example

Sizing Calculator

Quick Calculator Table

Calculator Script

Workload-Specific Adjustments

Logging/Time-Series

Full-Text Search

Analytics/Aggregations

Master Node Sizing

Dedicated Masters Required When

Master Node Requirements

Validation Checklist

Related Topics