NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Sizing Calculator: Nodes, Shards, Storage, and Heap

This is a working calculator for sizing an Elasticsearch cluster. The inputs that drive the math are daily ingest volume, retention window, replica count, and target shard size. The outputs are total storage with overhead, primary and total shard count, recommended data-node count, and JVM heap per node. The defaults below reflect current Elasticsearch guidance: 20-40 GB per shard for most workloads, JVM heap = min(50% of RAM, 31 GB), no more than 20 shards per GB of heap, and the `cluster.max_shards_per_node` ceiling of 1000.

Inputs and Outputs

Input Description Typical
Daily data volume Raw bytes indexed per day, before replication and overhead 50 GB - 10 TB
Retention How long to keep data online 7-90 days
Replicas Replica count per primary (default 1) 0-2
Target shard size Bytes per primary shard 20-40 GB
Storage per node Usable disk per data node 500 GB - 8 TB
RAM per node Physical memory per data node 32-128 GB
Output Calculation
Total storage daily x retention x (1 + replicas) x 1.15 overhead
Primary shards (daily x retention x 1.15) / target_shard_size
Total shards primary_shards x (1 + replicas)
Min data nodes (storage) ceil(total_storage / storage_per_node)
Min data nodes (shards) ceil(total_shards / (heap_gb x 20))
Heap per node min(RAM x 0.5, 31 GB)

Storage Calculation

Elasticsearch on disk is larger than the raw input it ingests. Replication doubles (or triples) the bytes. Lucene segment files, doc values, translog, and the headroom required for merges add another 10-30%.

Formula:

total_storage = daily_data x retention_days x (1 + replicas) x 1.15

Worked example for a logging workload:

Daily data: 100 GB raw JSON
Retention: 30 days
Replicas: 1

total_storage = 100 x 30 x 2 x 1.15 = 6900 GB (~6.9 TB)

The 1.15 multiplier is a baseline; on text-heavy workloads with rich analysers it can climb to 1.3. Add another 20-30% headroom on top so you do not run the cluster near the 85% low watermark in steady state.

Factor Multiplier What it covers
Replication 1 + replicas Replica copies
Index overhead 1.1 - 1.3 Lucene structures, doc values
Translog 1.05 Write-ahead log
Headroom 1.2 - 1.3 Growth buffer below watermark

Shard Calculation

Target 20-40 GB per primary shard for most workloads. The current Elasticsearch guidance is to keep shards between 10 and 50 GB, with rules of thumb under 200 million documents per shard. Smaller shards waste heap; larger shards slow recovery and rebalance.

primary_shards = (daily_data x retention_days x 1.15) / target_shard_size
total_shards   = primary_shards x (1 + replicas)

Worked example:

Total primary data after overhead: 100 x 30 x 1.15 = 3450 GB
Target shard size: 40 GB
Primary shards: ceil(3450 / 40) = 87
With 1 replica: 174 total shards

For time-series workloads, do not compute shards monolithically - drive shard count via size-based rollover with max_primary_shard_size: 40gb. That way shard size stays bounded as ingest fluctuates.

Node Count Calculation

Two constraints govern data-node count: storage and shards.

nodes_by_storage = ceil(total_storage / storage_per_node)
nodes_by_shards  = ceil(total_shards / (heap_gb x 20))
data_nodes       = max(3, nodes_by_storage, nodes_by_shards)

The shard-per-heap rule (20 shards per GB of heap) usually binds before the cluster cap (1000 shards per node). With a 31 GB heap, the recommended ceiling is ~620 shards per node, not 1000.

Worked example:

total_storage = 6900 GB; storage_per_node = 1000 GB; total_shards = 174
heap = 31 GB; shard ceiling per node = 31 x 20 = 620

nodes_by_storage = ceil(6900 / 1000) = 7
nodes_by_shards  = ceil(174 / 620) = 1
data_nodes       = max(3, 7, 1) = 7

Storage binds in this example. For oversharded clusters the shard-per-heap rule binds instead, and the answer is "more, smaller nodes" or "consolidate indices".

Heap Sizing

RAM per node Heap Filesystem cache
16 GB 8 GB ~7 GB
32 GB 16 GB ~15 GB
64 GB 31 GB (capped) ~31 GB
128 GB 31 GB (capped) ~95 GB

Two rules:

  1. JVM heap = min(50% of physical RAM, 31 GB). Above 31 GB, the JVM stops using compressed object pointers (Compressed Oops) and effective heap utilisation drops.
  2. Leave the other 50% for the filesystem cache. Lucene relies on the OS page cache for query performance; starving it hurts more than a small heap.

Going above 64 GB RAM per node is fine for the filesystem cache; the JVM heap stays at 31 GB regardless.

Quick Reference Table

Daily Volume Retention Replicas Storage Primary Shards Min Data Nodes
10 GB 30 d 1 690 GB 9 3
50 GB 30 d 1 3.4 TB 43 4
100 GB 30 d 1 6.9 TB 87 7
500 GB 30 d 1 34.5 TB 432 35
1 TB 30 d 1 69 TB 863 69

Numbers assume 40 GB target shards, 1 TB usable per node, replicas = 1, 64 GB RAM per node. Adjust to your hardware.

Calculator Script

def elasticsearch_sizing(
    daily_data_gb: float,
    retention_days: int,
    replicas: int = 1,
    target_shard_size_gb: float = 40,
    storage_per_node_gb: float = 1000,
    ram_per_node_gb: float = 64,
):
    overhead = 1.15
    # Primary data after Lucene + translog overhead
    primary_gb = daily_data_gb * retention_days * overhead
    # Total storage including replicas
    total_storage_gb = primary_gb * (1 + replicas)
    # Shard counts
    primary_shards = max(1, int(-(-primary_gb // target_shard_size_gb)))  # ceil
    total_shards = primary_shards * (1 + replicas)
    # Node math
    heap_gb = min(ram_per_node_gb * 0.5, 31)
    shard_ceiling_per_node = heap_gb * 20
    nodes_by_storage = max(1, int(-(-total_storage_gb // storage_per_node_gb)))
    nodes_by_shards = max(1, int(-(-total_shards // shard_ceiling_per_node)))
    data_nodes = max(3, nodes_by_storage, nodes_by_shards)
    return {
        "total_storage_tb": round(total_storage_gb / 1024, 2),
        "primary_shards": primary_shards,
        "total_shards": total_shards,
        "recommended_data_nodes": data_nodes,
        "heap_per_node_gb": int(heap_gb),
        "storage_per_node_gb": int(total_storage_gb / data_nodes),
    }

print(elasticsearch_sizing(daily_data_gb=100, retention_days=30))

Workload Adjustments

Workload Shard size Replicas RAM bias Notes
Logging / time-series 40-50 GB 1 Standard Use ILM with rollover and delete phases; consider hot-warm
Full-text search 10-30 GB 2 More heap Smaller shards keep query latency low; replicas spread read load
Analytics / aggs 30-40 GB 1 More RAM for filesystem cache Filesystem cache size dominates aggregation speed
Multi-tenant search 20-30 GB 1-2 Standard Use filtered aliases or routing, not one index per tenant

Master Node Sizing

Production clusters should have three dedicated master-eligible nodes. They do not store user data and have lower hardware requirements.

Cluster Size Master Nodes Heap CPU
3-5 data 3 4 GB 2 cores
5-20 data 3 4-8 GB 4 cores
20-50 data 3 8-16 GB 4-8 cores
50+ data 3 16 GB 8 cores

`cluster.initial_master_nodes` is set on every master-eligible node at first bootstrap and removed afterwards.

Validation Checklist

  • Storage has 30% headroom above the 85% low watermark.
  • Heap per node is at most 31 GB.
  • Shards per node are within 20 per GB of heap.
  • Primary shard size is 10-50 GB at steady state.
  • At least 3 dedicated master-eligible nodes.
  • At least 3 data nodes for any production workload with replicas.
  • ILM in place for time-series data, with rollover and delete phases.

How Pulse Helps With Sizing

Sizing is not a one-shot exercise. Ingest grows, query patterns shift, retention rules change. Pulse continuously inspects your Elasticsearch and OpenSearch clusters against the sizing rules above: it surfaces shards drifting outside the 10-50 GB band, nodes approaching the shard-per-heap ceiling, indices that should have been rolled over or deleted, and templates that bake in stale number_of_shards defaults. The output is a concrete, prioritised set of changes rather than abstract dashboards.

Frequently Asked Questions

Q: How much storage does Elasticsearch need per GB of raw data?
A: Roughly 2.3 GB per 1 GB of raw input with 1 replica: 1x replication overhead plus ~15% for Lucene structures and translog. Add 20-30% headroom so the cluster stays below the 85% low watermark.

Q: How many shards per node is too many in Elasticsearch?
A: Elastic recommends staying under 20 shards per GB of JVM heap. On a 31 GB heap that is ~620 shards per node. The hard cluster cap (cluster.max_shards_per_node) is 1000 since 7.0, but the heap-based rule binds first.

Q: What is the optimal shard size in Elasticsearch?
A: 10-50 GB per primary shard, with 20-40 GB the sweet spot. Logging and analytics tolerate larger shards (40-50 GB); search-heavy workloads benefit from 10-30 GB. Avoid shards under 1 GB in production.

Q: Should JVM heap ever exceed 32 GB on an Elasticsearch node?
A: No. Above ~31 GB the JVM stops using Compressed Oops and effective heap utilisation drops. Use 31 GB as the ceiling and put any extra RAM into the filesystem cache, which is where Lucene gets its query speed.

Q: How many master nodes does an Elasticsearch cluster need?
A: Three dedicated master-eligible nodes for any production cluster. Three lets the cluster lose one node and still elect a master. Two creates split-brain risk; five is appropriate for very large clusters.

Q: How do I know if my cluster is undersized?
A: Symptoms include sustained high CPU (especially on master), search-thread-pool queueing, frequent GC pauses, indices in yellow or red status during routine load, and disk crossing the high watermark regularly. Run the calculator above and compare to actual capacity.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.