NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Shard Sizing Best Practices: 20-40 GB Target, 200M Docs Max

The shard sizing rules that matter in production Elasticsearch are simple: target 20-40 GB per primary shard, keep document count under 200 million per shard, and keep total shards per node under 20 per GB of JVM heap. Shards under 1 GB waste heap; shards over 50 GB make recovery and rebalance painful. The decision is set at index creation through number_of_shards, which cannot be changed in place. For time-series workloads, manage shard size through size-based rollover with max_primary_shard_size: 40gb. For static indices, use shrink or split to adjust.

Target Ranges

Workload Primary shard size Notes
Search-focused, low write rate 10-30 GB Smaller shards keep search latency low; replicas spread read load
Mixed workload 20-40 GB The sweet spot for most production clusters
Logging / time-series 30-50 GB Larger shards reduce overhead on high-volume ingest
Write-heavy bulk 40-50 GB Maximises sequential write efficiency

The current Elasticsearch guidance is to keep shards between 10 GB and 50 GB, with a separate rule of thumb of no more than ~200 million documents per shard (the Lucene segment limit is ~2 billion, but well before that, query latency degrades).

Why the lower bound matters: each open shard consumes 1-2 MB of JVM heap for cluster state plus heap for caches and segment metadata. A thousand 100 MB shards burn more heap than a few hundred 30 GB shards while holding less data.

Why the upper bound matters: a 100 GB shard takes hours to recover during a node restart or rebalance. Tail latency on queries that hit a single large shard is dominated by that shard.

Calculating Shard Count for New Indices

For a static index where you know the expected size:

primary_shards = ceil(expected_size_gb / target_shard_size_gb)

Example: 200 GB index, target 40 GB shards: 5 primary shards.

For time-series indices, do not pick a fixed shard count - drive it via rollover:

PUT /_ilm/policy/logs-30d
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "40gb",
            "max_age": "7d"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": { "delete": {} }
      }
    }
  }
}

This rolls over when any primary shard reaches 40 GB, or after 7 days, whichever comes first. Each rolled-over index starts with the `number_of_shards` from the matching index template, so set that to a reasonable starting point (often 1).

Check current shard sizes:

GET /_cat/shards?v&h=index,shard,prirep,store&s=store:desc

Strategies by Pattern

Time-series with ILM and size rollover

The canonical pattern for logs, metrics, and event streams:

PUT /_index_template/logs-template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs-30d",
      "index.lifecycle.rollover_alias": "logs"
    }
  }
}

Single primary shard per index; ILM keeps shard size bounded. For high-volume streams that hit 40 GB faster than refresh can keep up, raise number_of_shards to 2 or 3.

Fixed shard count for static indices

For indices with predictable, bounded data:

PUT /products-catalog
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1
  }
}

Choose number_of_shards so each primary ends up in the 20-40 GB range at steady state.

Single shard for small indices

If an index will never exceed 50 GB total, use one primary shard:

PUT /small-config-index
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  }
}

This is the default in Elasticsearch 7.0+ (the 6.x default of 5 shards is a common source of oversharding when old templates linger).

Spread across nodes for HA

To force shard placement across more nodes:

PUT /ha-index
{
  "settings": {
    "number_of_shards": 6,
    "number_of_replicas": 2,
    "index.routing.allocation.total_shards_per_node": 2
  }
}

total_shards_per_node: 2 caps how many shards of this index can land on any one node, forcing spread. Use carefully - if the cluster cannot satisfy the constraint, shards stay unassigned and the cluster goes yellow.

Adjusting Existing Indices

Shrink an oversharded index

Use shrink when you have far more primaries than the data warrants. The target shard count must be a divisor of the original:

# Step 1: prepare - mark read-only and concentrate on one node
PUT /old-index/_settings
{
  "settings": {
    "index.blocks.write": true,
    "index.routing.allocation.require._name": "node-1"
  }
}

# Step 2: shrink (10 shards down to 2)
POST /old-index/_shrink/old-index-shrunk
{
  "settings": {
    "index.number_of_shards": 2,
    "index.number_of_replicas": 1,
    "index.blocks.write": null,
    "index.routing.allocation.require._name": null
  }
}

Split an undersharded index

Use split to increase primary count. The target must be a multiple of the original:

PUT /old-index/_settings
{ "index.blocks.write": true }

POST /old-index/_split/old-index-split
{
  "settings": { "index.number_of_shards": 4 }
}

For more complex transformations (mapping changes, different shard layout that is not a multiple/divisor), use reindex.

Monitoring Shard Health

# Top 20 largest shards
GET /_cat/shards?v&h=index,shard,prirep,store&s=store:desc

# Per-node shard distribution
GET /_cat/allocation?v

# Hot or unbalanced shards
GET /_cat/shards?v&h=index,shard,prirep,store,node&s=node

Alerting thresholds worth setting:

Condition Suggested action
Any primary shard > 50 GB Investigate, consider rollover or split
Any primary shard < 1 GB and stable Consider consolidation
Average shard size variance > 50% across nodes Investigate hot index or routing imbalance
Total shards approaching max_shards_per_node x nodes Audit templates and ILM coverage

Common Mistakes

  1. One index per document type. Old advice from pre-6.x Elasticsearch. Consolidate into a single index with a type field instead.
  2. Fixed daily shard count regardless of volume. A 5-shard daily index for 1 GB/day data produces tiny shards. Use size-based rollover.
  3. Ignoring the replica multiplier. Total shards = primaries x (1 + replicas). 5 primaries with 2 replicas is 15 shards, not 5.
  4. Default number_of_shards: 5 from old templates. Audit templates inherited from 6.x deployments.
  5. Many small per-tenant indices. For multi-tenant systems prefer a shared index with filtered aliases over an index per tenant.

Decision Tree

What is the steady-state data size?
|
├── < 50 GB total
|   └── 1 primary shard
|
├── 50 GB - 500 GB total, static
|   └── 2-10 primary shards (target 30-50 GB each)
|
├── > 500 GB total, static
|   └── ceil(total / 40 GB) primary shards
|
└── Time-series (any volume)
    └── ILM rollover with max_primary_shard_size: 40gb
        Single primary shard in the template, ILM controls cumulative size

How Pulse Helps With Shard Sizing

Shard sizing decisions made on day one rarely stay correct. Ingest grows, query patterns shift, retention policy changes, and templates inherited from older versions silently push new indices into bad shapes. Pulse continuously inspects shard size distribution, document counts, segment counts, template defaults, and ILM coverage. It surfaces undersized and oversized shards by name, identifies templates that bake in stale shard counts, and flags indices that should have rolled over but did not. Connecting your cluster to Pulse proactive monitoring means these issues get caught before they manifest as performance regressions.

Frequently Asked Questions

Q: What is the optimal shard size in Elasticsearch?
A: 10-50 GB per primary shard, with 20-40 GB the recommended sweet spot for mixed workloads. Search-heavy workloads benefit from the lower end (10-30 GB); logging and analytics tolerate the upper end (40-50 GB). Avoid shards under 1 GB in production.

Q: How many documents can a single Elasticsearch shard hold?
A: The Lucene hard limit is ~2 billion documents per shard, but query and indexing performance degrade well before that. The Elastic-recommended rule of thumb is to stay under 200 million documents per shard.

Q: How many shards per node is too many in Elasticsearch?
A: Elastic recommends no more than 20 shards per GB of JVM heap. With a 31 GB heap, that is ~620 shards per node. The hard cluster cap is cluster.max_shards_per_node, default 1000 since 7.0, but the heap-based rule binds first.

Q: Can I change the number of primary shards on an existing Elasticsearch index?
A: Not directly. Use the Shrink API to reduce (the new count must divide the original), the Split API to increase (the new count must be a multiple), or reindex into a new index with the desired shard count.

Q: How does shard size affect recovery and rebalance?
A: Recovery time scales with shard size. A 100 GB shard can take an hour or more to rebuild over a 100 MB/s network with default throttling. Keeping shards under 50 GB keeps recovery time manageable, which keeps the cluster's window of reduced redundancy small.

Q: Should I use the Shrink API or the Split API?
A: Shrink reduces primary count (oversharded indices); Split increases it (undersharded indices). Both produce a new index and require the source to be read-only. For arbitrary transformations not expressible as multiplies or divides of the original, reindex into a fresh index.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.