Exceeded maximum shard size for an index is the human-facing description of a state where one or more primary shards of an index have grown past the size at which Elasticsearch can serve and recover them efficiently. Elastic's guidance is to keep shard sizes between 10 GB and 50 GB for most workloads. There is no hard runtime cap that throws this exact exception in Elasticsearch core, but ILM rollover policies, custom client-side checks, and the Elasticsearch shard size guidance all surface this as an error. The durable fix is to rollover or reindex into a new index with more primary shards.
What This Error Means
Elasticsearch routes documents to shards by hashing the routing key modulo the number of primary shards. Once an index is created, number_of_shards cannot be changed; the only way to add primary capacity is to write new documents into a new index. Shards larger than ~50 GB are slow to recover after node failures (Lucene segment files must transit the network), slow to balance during rebalances, and reduce search parallelism (one shard processes one core's worth of work).
The error is typically thrown by an ILM policy with a rollover action condition like max_primary_shard_size: 50gb, or by application-level pre-flight checks. Elasticsearch itself does not silently refuse writes at any specific shard size.
Common Causes
- Initial
number_of_shardstoo low for ingest volume. How to confirm:GET <index>/_settingsfornumber_of_shards;GET _cat/shards/<index>?v&h=index,shard,prirep,storefor current sizes. - ILM rollover policy never met its trigger. How to confirm:
GET <index>/_ilm/explainshows the policy state and the unmet rollover conditions. - Time-series index without rollover writing into the same index for years. How to confirm:
GET _cat/indices?vshows one huge index instead of a.ds-*data stream backing index per epoch. - Custom routing concentrating writes on one shard. How to confirm:
GET <index>/_search?routing=<key>&size=0and compare per-shard doc counts via_cat/shards.
How to Fix Exceeded Maximum Shard Size
Inspect current primary shard sizes:
GET _cat/shards/<index>?v&h=index,shard,prirep,storeRoll the index over to a fresh index/backing index. With a data stream:
POST /<data-stream>/_rolloverWith an alias-managed index:
POST /<write-alias>/_rollover { "conditions": { "max_primary_shard_size": "50gb" } }Apply or update an ILM policy so future rollover happens automatically:
PUT _ilm/policy/logs-policy { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_primary_shard_size": "50gb", "max_age": "30d" } } } } } }For oversized indices that cannot rollover (no alias/data stream), create a new index with more primary shards and reindex:
PUT /my-index-v2 { "settings": { "number_of_shards": 6, "number_of_replicas": 1 } } POST _reindex { "source": { "index": "my-index" }, "dest": { "index": "my-index-v2" } }Swap traffic via an alias once reindex completes.
Or use the Split API to increase shard count on an existing index (requires
number_of_routing_shardsto be a multiple of the target):POST /my-index/_split/my-index-split { "settings": { "index.number_of_shards": 6 } }The source index must be read-only (
index.blocks.write: true) first.Use the Shrink API in the opposite direction when shards are too small.
Resolve Oversized Shards Automatically with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. When primary shards exceed the 10-50 GB band recommended by Elastic and Exceeded maximum shard size for an index surfaces from ILM or a custom check, Pulse:
- Reads per-shard sizes from
_cat/shards/<index>?v&h=index,shard,prirep,store, thenumber_of_shardssetting, ILM state from<index>/_ilm/explain, and routing distribution to distinguish a single skewed shard from an under-provisioned index - Identifies which of the four causes applies: initial
number_of_shardstoo low for ingest volume, ILM rollover condition (e.g.,max_primary_shard_size: 50gb) that never fired, time-series index without a data stream, or custom routing concentrating writes on one shard - Generates the exact remediation: the
POST /<data-stream>/_rollovercall, thePUT _ilm/policy/<name>payload withmax_primary_shard_size: 50gbandmax_age: 30d, thePUT /my-index-v2 { number_of_shards: N }plusPOST _reindexplan, or thePOST /<index>/_split/<new>plan with the read-only block step - Applies ILM policy updates and triggers rollover automatically with operator approval; leaves reindex and split workflows as one-click PRs because they need traffic cutover coordination
Pulse alerts when any primary approaches the configured rollover threshold, preventing the slow-recovery and rebalance-stall failure modes oversized shards cause.
Start a free trial to connect your cluster.
Frequently Asked Questions
Q: What is the maximum recommended shard size in Elasticsearch?
A: Elastic recommends keeping primary shards between 10 GB and 50 GB for most workloads, with the explicit upper limit being a recovery-time consideration rather than a hard cap. Documents per shard should stay well below 2.1 billion (the Lucene per-segment doc-ID limit applied at the shard level).
Q: Can I change the number of shards for an existing index?
A: Not directly. Use the Split API to increase or the Shrink API to decrease, both of which produce a new index. The number_of_shards setting itself is immutable after index creation.
Q: How do I avoid this error in the first place?
A: Use data streams + ILM with a rollover action keyed to max_primary_shard_size (50 GB) and max_age (e.g., 30 days). New backing indices are created automatically once either condition is met.
Q: Will oversized shards cause data loss?
A: Not directly. They cause longer recovery times after node restarts, slower cluster rebalancing, and degraded query latency. The risk of data loss only rises during a recovery that takes long enough to expose the cluster to additional failures.
Q: Does increasing the max shard size help?
A: There is no setting that magically makes large shards perform better; the recommendation is based on recovery and rebalance behavior. Raising any per-policy threshold above 50 GB just defers the same problem.
Q: Should I use Split or Reindex to fix oversized shards?
A: Split is faster (it hardlinks segment files into the new index) but requires the source to be read-only and the new shard count to be a clean multiple. Reindex works in all cases but is slower. For oversized active write indices, prefer rollover so the new backing index has the right shard count from the start.
Q: What's the fastest way to diagnose oversized shards in production?
A: Pulse, the AI DBA for Elasticsearch and OpenSearch, reads per-shard sizes, number_of_shards, ILM state, and routing distribution in one view, then names whether the cause is missing rollover, undersized initial shard count, or routing skew. It applies ILM policy updates and triggers rollover with approval and proposes the reindex or split plan when an existing index has to be reshaped.
Related Reading
- Elasticsearch reindex data guide: for the standard reindex path.
- Elasticsearch rollover index: manual and automated rollover.
- Elasticsearch cluster max shards per node: cluster-wide shard cap.
- Elasticsearch create index with mapping: defining
number_of_shardscorrectly upfront. - Elasticsearch monitoring: tracking shard sizes proactively.