Proper sizing ensures your Elasticsearch cluster can handle your workload efficiently. This guide provides formulas and methods to calculate cluster requirements.
Key Sizing Factors
Input Variables
| Variable |
Description |
| Daily data volume |
Raw data indexed per day |
| Retention period |
How long to keep data |
| Replication factor |
Number of replicas (usually 1) |
| Query load |
Searches per second |
| Indexing rate |
Documents per second |
Output Requirements
| Resource |
Calculation |
| Total storage |
(Daily volume × Retention × (1 + Replicas)) × 1.15 |
| Shard count |
Total data / Target shard size |
| Node count |
Based on storage, CPU, memory needs |
| Heap size |
50% of RAM, max 32 GB |
Storage Calculations
Formula
Total Storage = Daily Data × Retention Days × (1 + Replicas) × 1.15 (overhead)
Example
Daily data: 100 GB
Retention: 30 days
Replicas: 1
Total Storage = 100 × 30 × 2 × 1.15 = 6,900 GB (~7 TB)
Storage Overhead Factors
| Factor |
Multiplier |
Description |
| Replication |
1 + replicas |
Data copies |
| Index overhead |
1.1-1.2 |
Lucene structures |
| Translog |
1.05 |
Write-ahead log |
| Headroom |
1.2-1.3 |
Growth buffer |
Shard Calculations
Optimal Shard Size
- Target: 10-50 GB per shard
- Search-focused: 10-30 GB
- Write-heavy: 30-50 GB
Formula
Primary Shards = Total Primary Data / Target Shard Size
Total Shards = Primary Shards × (1 + Replicas)
Example
Total data: 3 TB primary
Target shard size: 40 GB
Primary shards = 3000 GB / 40 GB = 75 shards
With 1 replica = 150 total shards
Node Count Calculations
Based on Storage
Data Nodes = Total Storage / Storage per Node
Based on Shards
Data Nodes = Total Shards / Max Shards per Node
Where Max Shards per Node = ~1000 (soft limit)
Better target: 20-30 shards per GB of heap
Based on CPU
Data Nodes = Peak Search QPS / Searches per Node
Data Nodes = Peak Index Rate / Index Rate per Node
Typical Node Capacities
| Node Type |
Typical Capacity |
| r5.xlarge (AWS) |
500 GB storage, 100 QPS |
| r5.2xlarge |
1 TB storage, 200 QPS |
| r5.4xlarge |
2 TB storage, 400 QPS |
Memory Calculations
Heap Sizing
Heap = min(RAM × 0.5, 31 GB)
Rule: Heap should be about half of RAM but never above 32 GB.
Memory Allocation
| Component |
Allocation |
| JVM Heap |
50% of RAM (max 32 GB) |
| Filesystem cache |
Remaining RAM |
| OS |
~1-2 GB |
Example
Server RAM: 64 GB
Heap: 31 GB
Filesystem cache: ~31 GB
OS: ~2 GB
Sizing Calculator
Quick Calculator Table
| Daily Volume |
Retention |
Replicas |
Storage Needed |
Shards |
Min Nodes |
| 10 GB |
30 days |
1 |
690 GB |
18 |
2 |
| 50 GB |
30 days |
1 |
3.4 TB |
86 |
4 |
| 100 GB |
30 days |
1 |
6.9 TB |
173 |
6 |
| 500 GB |
30 days |
1 |
34.5 TB |
862 |
15 |
| 1 TB |
30 days |
1 |
69 TB |
1725 |
30 |
Calculator Script
def calculate_elasticsearch_sizing(
daily_data_gb: float,
retention_days: int,
replicas: int = 1,
target_shard_size_gb: float = 40,
storage_per_node_gb: float = 1000,
ram_per_node_gb: float = 64
):
# Storage calculation
overhead_factor = 1.15
total_storage_gb = daily_data_gb * retention_days * (1 + replicas) * overhead_factor
# Shard calculation
primary_data_gb = daily_data_gb * retention_days * overhead_factor
primary_shards = max(1, int(primary_data_gb / target_shard_size_gb))
total_shards = primary_shards * (1 + replicas)
# Node calculation (based on storage)
min_nodes_storage = max(3, int(total_storage_gb / storage_per_node_gb) + 1)
# Node calculation (based on shards)
heap_gb = min(ram_per_node_gb * 0.5, 31)
max_shards_per_node = heap_gb * 20 # Conservative estimate
min_nodes_shards = max(3, int(total_shards / max_shards_per_node) + 1)
# Take the higher of the two
recommended_nodes = max(min_nodes_storage, min_nodes_shards)
return {
"total_storage_tb": round(total_storage_gb / 1000, 1),
"primary_shards": primary_shards,
"total_shards": total_shards,
"recommended_data_nodes": recommended_nodes,
"heap_per_node_gb": int(heap_gb),
"storage_per_node_gb": int(total_storage_gb / recommended_nodes)
}
# Example usage
result = calculate_elasticsearch_sizing(
daily_data_gb=100,
retention_days=30,
replicas=1
)
print(result)
Workload-Specific Adjustments
Logging/Time-Series
- Higher write throughput needed
- Larger shards acceptable (40-50 GB)
- Consider hot-warm architecture
- Use ILM for data management
Full-Text Search
- Lower write, higher read
- Smaller shards (10-30 GB) for better search latency
- More replicas for read scaling
- Consider dedicated coordinating nodes
Analytics/Aggregations
- Memory-intensive
- Larger heap beneficial
- Fewer, larger nodes often better
- Consider dedicated data nodes for heavy aggregations
Master Node Sizing
Dedicated Masters Required When
- Cluster > 5 data nodes
- Production environment
- High availability required
Master Node Requirements
| Cluster Size |
Master Nodes |
Min RAM |
Min CPU |
| Small (3-5 data) |
3 |
2 GB heap |
2 cores |
| Medium (5-20 data) |
3 |
4 GB heap |
4 cores |
| Large (20-50 data) |
3 |
8 GB heap |
8 cores |
| Very Large (50+) |
3 |
16 GB heap |
16 cores |
Validation Checklist
After sizing: