This is a working calculator for sizing an Elasticsearch cluster. The inputs that drive the math are daily ingest volume, retention window, replica count, and target shard size. The outputs are total storage with overhead, primary and total shard count, recommended data-node count, and JVM heap per node. The defaults below reflect current Elasticsearch guidance: 20-40 GB per shard for most workloads, JVM heap = min(50% of RAM, 31 GB), no more than 20 shards per GB of heap, and the `cluster.max_shards_per_node` ceiling of 1000.
Inputs and Outputs
| Input | Description | Typical |
|---|---|---|
| Daily data volume | Raw bytes indexed per day, before replication and overhead | 50 GB - 10 TB |
| Retention | How long to keep data online | 7-90 days |
| Replicas | Replica count per primary (default 1) | 0-2 |
| Target shard size | Bytes per primary shard | 20-40 GB |
| Storage per node | Usable disk per data node | 500 GB - 8 TB |
| RAM per node | Physical memory per data node | 32-128 GB |
| Output | Calculation |
|---|---|
| Total storage | daily x retention x (1 + replicas) x 1.15 overhead |
| Primary shards | (daily x retention x 1.15) / target_shard_size |
| Total shards | primary_shards x (1 + replicas) |
| Min data nodes (storage) | ceil(total_storage / storage_per_node) |
| Min data nodes (shards) | ceil(total_shards / (heap_gb x 20)) |
| Heap per node | min(RAM x 0.5, 31 GB) |
Storage Calculation
Elasticsearch on disk is larger than the raw input it ingests. Replication doubles (or triples) the bytes. Lucene segment files, doc values, translog, and the headroom required for merges add another 10-30%.
Formula:
total_storage = daily_data x retention_days x (1 + replicas) x 1.15
Worked example for a logging workload:
Daily data: 100 GB raw JSON
Retention: 30 days
Replicas: 1
total_storage = 100 x 30 x 2 x 1.15 = 6900 GB (~6.9 TB)
The 1.15 multiplier is a baseline; on text-heavy workloads with rich analysers it can climb to 1.3. Add another 20-30% headroom on top so you do not run the cluster near the 85% low watermark in steady state.
| Factor | Multiplier | What it covers |
|---|---|---|
| Replication | 1 + replicas | Replica copies |
| Index overhead | 1.1 - 1.3 | Lucene structures, doc values |
| Translog | 1.05 | Write-ahead log |
| Headroom | 1.2 - 1.3 | Growth buffer below watermark |
Shard Calculation
Target 20-40 GB per primary shard for most workloads. The current Elasticsearch guidance is to keep shards between 10 and 50 GB, with rules of thumb under 200 million documents per shard. Smaller shards waste heap; larger shards slow recovery and rebalance.
primary_shards = (daily_data x retention_days x 1.15) / target_shard_size
total_shards = primary_shards x (1 + replicas)
Worked example:
Total primary data after overhead: 100 x 30 x 1.15 = 3450 GB
Target shard size: 40 GB
Primary shards: ceil(3450 / 40) = 87
With 1 replica: 174 total shards
For time-series workloads, do not compute shards monolithically - drive shard count via size-based rollover with max_primary_shard_size: 40gb. That way shard size stays bounded as ingest fluctuates.
Node Count Calculation
Two constraints govern data-node count: storage and shards.
nodes_by_storage = ceil(total_storage / storage_per_node)
nodes_by_shards = ceil(total_shards / (heap_gb x 20))
data_nodes = max(3, nodes_by_storage, nodes_by_shards)
The shard-per-heap rule (20 shards per GB of heap) usually binds before the cluster cap (1000 shards per node). With a 31 GB heap, the recommended ceiling is ~620 shards per node, not 1000.
Worked example:
total_storage = 6900 GB; storage_per_node = 1000 GB; total_shards = 174
heap = 31 GB; shard ceiling per node = 31 x 20 = 620
nodes_by_storage = ceil(6900 / 1000) = 7
nodes_by_shards = ceil(174 / 620) = 1
data_nodes = max(3, 7, 1) = 7
Storage binds in this example. For oversharded clusters the shard-per-heap rule binds instead, and the answer is "more, smaller nodes" or "consolidate indices".
Heap Sizing
| RAM per node | Heap | Filesystem cache |
|---|---|---|
| 16 GB | 8 GB | ~7 GB |
| 32 GB | 16 GB | ~15 GB |
| 64 GB | 31 GB (capped) | ~31 GB |
| 128 GB | 31 GB (capped) | ~95 GB |
Two rules:
- JVM heap = min(50% of physical RAM, 31 GB). Above 31 GB, the JVM stops using compressed object pointers (Compressed Oops) and effective heap utilisation drops.
- Leave the other 50% for the filesystem cache. Lucene relies on the OS page cache for query performance; starving it hurts more than a small heap.
Going above 64 GB RAM per node is fine for the filesystem cache; the JVM heap stays at 31 GB regardless.
Quick Reference Table
| Daily Volume | Retention | Replicas | Storage | Primary Shards | Min Data Nodes |
|---|---|---|---|---|---|
| 10 GB | 30 d | 1 | 690 GB | 9 | 3 |
| 50 GB | 30 d | 1 | 3.4 TB | 43 | 4 |
| 100 GB | 30 d | 1 | 6.9 TB | 87 | 7 |
| 500 GB | 30 d | 1 | 34.5 TB | 432 | 35 |
| 1 TB | 30 d | 1 | 69 TB | 863 | 69 |
Numbers assume 40 GB target shards, 1 TB usable per node, replicas = 1, 64 GB RAM per node. Adjust to your hardware.
Calculator Script
def elasticsearch_sizing(
daily_data_gb: float,
retention_days: int,
replicas: int = 1,
target_shard_size_gb: float = 40,
storage_per_node_gb: float = 1000,
ram_per_node_gb: float = 64,
):
overhead = 1.15
# Primary data after Lucene + translog overhead
primary_gb = daily_data_gb * retention_days * overhead
# Total storage including replicas
total_storage_gb = primary_gb * (1 + replicas)
# Shard counts
primary_shards = max(1, int(-(-primary_gb // target_shard_size_gb))) # ceil
total_shards = primary_shards * (1 + replicas)
# Node math
heap_gb = min(ram_per_node_gb * 0.5, 31)
shard_ceiling_per_node = heap_gb * 20
nodes_by_storage = max(1, int(-(-total_storage_gb // storage_per_node_gb)))
nodes_by_shards = max(1, int(-(-total_shards // shard_ceiling_per_node)))
data_nodes = max(3, nodes_by_storage, nodes_by_shards)
return {
"total_storage_tb": round(total_storage_gb / 1024, 2),
"primary_shards": primary_shards,
"total_shards": total_shards,
"recommended_data_nodes": data_nodes,
"heap_per_node_gb": int(heap_gb),
"storage_per_node_gb": int(total_storage_gb / data_nodes),
}
print(elasticsearch_sizing(daily_data_gb=100, retention_days=30))
Workload Adjustments
| Workload | Shard size | Replicas | RAM bias | Notes |
|---|---|---|---|---|
| Logging / time-series | 40-50 GB | 1 | Standard | Use ILM with rollover and delete phases; consider hot-warm |
| Full-text search | 10-30 GB | 2 | More heap | Smaller shards keep query latency low; replicas spread read load |
| Analytics / aggs | 30-40 GB | 1 | More RAM for filesystem cache | Filesystem cache size dominates aggregation speed |
| Multi-tenant search | 20-30 GB | 1-2 | Standard | Use filtered aliases or routing, not one index per tenant |
Master Node Sizing
Production clusters should have three dedicated master-eligible nodes. They do not store user data and have lower hardware requirements.
| Cluster Size | Master Nodes | Heap | CPU |
|---|---|---|---|
| 3-5 data | 3 | 4 GB | 2 cores |
| 5-20 data | 3 | 4-8 GB | 4 cores |
| 20-50 data | 3 | 8-16 GB | 4-8 cores |
| 50+ data | 3 | 16 GB | 8 cores |
`cluster.initial_master_nodes` is set on every master-eligible node at first bootstrap and removed afterwards.
Validation Checklist
- Storage has 30% headroom above the 85% low watermark.
- Heap per node is at most 31 GB.
- Shards per node are within 20 per GB of heap.
- Primary shard size is 10-50 GB at steady state.
- At least 3 dedicated master-eligible nodes.
- At least 3 data nodes for any production workload with replicas.
- ILM in place for time-series data, with rollover and delete phases.
How Pulse Helps With Sizing
Sizing is not a one-shot exercise. Ingest grows, query patterns shift, retention rules change. Pulse continuously inspects your Elasticsearch and OpenSearch clusters against the sizing rules above: it surfaces shards drifting outside the 10-50 GB band, nodes approaching the shard-per-heap ceiling, indices that should have been rolled over or deleted, and templates that bake in stale number_of_shards defaults. The output is a concrete, prioritised set of changes rather than abstract dashboards.
Frequently Asked Questions
Q: How much storage does Elasticsearch need per GB of raw data?
A: Roughly 2.3 GB per 1 GB of raw input with 1 replica: 1x replication overhead plus ~15% for Lucene structures and translog. Add 20-30% headroom so the cluster stays below the 85% low watermark.
Q: How many shards per node is too many in Elasticsearch?
A: Elastic recommends staying under 20 shards per GB of JVM heap. On a 31 GB heap that is ~620 shards per node. The hard cluster cap (cluster.max_shards_per_node) is 1000 since 7.0, but the heap-based rule binds first.
Q: What is the optimal shard size in Elasticsearch?
A: 10-50 GB per primary shard, with 20-40 GB the sweet spot. Logging and analytics tolerate larger shards (40-50 GB); search-heavy workloads benefit from 10-30 GB. Avoid shards under 1 GB in production.
Q: Should JVM heap ever exceed 32 GB on an Elasticsearch node?
A: No. Above ~31 GB the JVM stops using Compressed Oops and effective heap utilisation drops. Use 31 GB as the ceiling and put any extra RAM into the filesystem cache, which is where Lucene gets its query speed.
Q: How many master nodes does an Elasticsearch cluster need?
A: Three dedicated master-eligible nodes for any production cluster. Three lets the cluster lose one node and still elect a master. Two creates split-brain risk; five is appropriate for very large clusters.
Q: How do I know if my cluster is undersized?
A: Symptoms include sustained high CPU (especially on master), search-thread-pool queueing, frequent GC pauses, indices in yellow or red status during routine load, and disk crossing the high watermark regularly. Run the calculator above and compare to actual capacity.
Related Reading
- Elasticsearch Shard Sizing Best Practices
- Elasticsearch cluster.max_shards_per_node
- Elasticsearch index.number_of_replicas
- Elasticsearch cluster.routing.allocation.disk.watermark.high
- Elasticsearch Cluster Status Yellow
- Elasticsearch cluster.initial_master_nodes
- Elasticsearch Capacity Planning Guide
- Elasticsearch Scaling Nodes Hot Warm Architecture