Having too many shards in an Elasticsearch cluster (oversharding) causes significant performance degradation and resource waste. This guide helps you identify oversharding problems and implement effective solutions.
Understanding the Problem
Why Too Many Shards Hurts Performance
- Memory overhead: Each shard consumes heap memory for metadata, segments, and caches
- Coordination cost: Queries must coordinate across all shards
- Cluster state size: More shards = larger cluster state to manage
- Recovery time: Many shards slow down node recovery
- File handles: Each shard opens multiple file handles
How Many Shards Is Too Many?
There's no absolute number, but consider these guidelines:
- Per node: Avoid exceeding 1,000 shards per node
- Per GB heap: The old "20 shards per GB heap" rule is deprecated but still directionally useful
- Per index: Question if you need more than a few shards per index
- Shard size: Shards under 1 GB are often too small
Identifying Oversharding
Check Total Shard Count
GET /_cluster/stats?filter_path=indices.shards.total
GET /_cat/shards?v | wc -l
Shards Per Node
GET /_cat/allocation?v
Average Shard Size
# Get shard sizes
GET /_cat/shards?v&h=index,shard,prirep,store&s=store:asc
If many shards are under 1 GB, you likely have oversharding.
Indices with Many Small Shards
GET /_cat/indices?v&h=index,pri,rep,store.size,pri.store.size&s=pri:desc
Symptoms of Oversharding
Performance Symptoms
- Slow search queries despite low data volume
- High memory usage with low data volume
- Long cluster recovery times
- Master node CPU spikes during state updates
Observable Metrics
| Metric | Healthy | Oversharded |
|---|---|---|
| Shards per node | < 500 | > 1000 |
| Average shard size | 10-50 GB | < 1 GB |
| Cluster state size | < 100 MB | > 500 MB |
| Heap per shard | Sufficient | High pressure |
Solutions
Solution 1: Use Index Lifecycle Management (ILM)
Configure ILM to control shard growth:
PUT _ilm/policy/controlled_shards
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_primary_shard_size": "40gb",
"max_age": "7d"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"shrink": {
"number_of_shards": 1
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Solution 2: Shrink Existing Indices
Reduce shard count on existing indices:
// Step 1: Block writes and relocate to single node
PUT /my-index/_settings
{
"settings": {
"index.blocks.write": true,
"index.routing.allocation.require._name": "shrink-node"
}
}
// Step 2: Wait for relocation
GET /_cat/recovery/my-index
// Step 3: Shrink
POST /my-index/_shrink/my-index-shrunk
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 1,
"index.blocks.write": null,
"index.routing.allocation.require._name": null
}
}
Solution 3: Reindex with Fewer Shards
For major restructuring:
// Create new index with fewer shards
PUT /my-index-reindexed
{
"settings": {
"number_of_shards": 3, // Instead of 30
"number_of_replicas": 1
}
}
// Reindex data
POST /_reindex
{
"source": {
"index": "my-index"
},
"dest": {
"index": "my-index-reindexed"
}
}
Solution 4: Delete Unnecessary Indices
Remove old, unused indices:
GET /_cat/indices?v&h=index,docs.count,store.size&s=docs.count:asc
DELETE /old-unused-index-*
Solution 5: Use Index Templates with Proper Defaults
Prevent future oversharding:
PUT _index_template/optimized_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"priority": 200
}
Solution 6: Use Data Streams
For time-series data, use data streams with ILM:
PUT _index_template/logs_template
{
"index_patterns": ["logs-*"],
"data_stream": {},
"template": {
"settings": {
"number_of_shards": 1
}
}
}
PUT _data_stream/logs-app
Calculating Ideal Shard Count
Formula for Time-Series Data
Daily data size / Target shard size = Shards per day
Example:
100 GB/day / 40 GB target = 2-3 shards per day
Formula for Static Data
Total data size / Target shard size = Total primary shards
Example:
500 GB total / 40 GB target = ~13 primary shards
Monitoring Shard Health
Regular Checks
# Total shards
GET /_cluster/health?filter_path=active_shards
# Shards per node
GET /_cat/allocation?v
# Small shards (potential oversharding)
GET /_cat/shards?v&h=index,shard,store&s=store:asc | head -50
Alert Conditions
- Total shard count > (node_count * 500)
- Average shard size < 5 GB
- Cluster state size growing rapidly
Prevention Checklist
- Index templates specify appropriate shard counts
- ILM policies control shard growth with rollover
- Data streams used for time-series data
- Regular review of shard distribution
- Deletion policies for old indices
- Target shard size: 10-50 GB
Related Topics
- Elasticsearch Shard Sizing Best Practices
- Elasticsearch Shard Imbalance Performance Issues
- Elasticsearch JVM Heap Pressure High
- Elasticsearch Capacity Planning Guide
Automating Detection and Resolution with Pulse
Oversharding problems often develop gradually, making them easy to miss until performance visibly degrades. Pulse automatically detects oversharding conditions through continuous monitoring of shard counts, shard sizes, cluster state size, and per-node resource utilization. When oversharding is identified—whether from excessive shard-per-node ratios, many undersized shards, or rapidly growing cluster state—Pulse's agentic SRE technology performs automated root-cause analysis, determining which indices are contributing most to shard proliferation, whether index templates are misconfigured, or if ILM policies are missing or ineffective.
Pulse can suggest specific remediation actions—such as shrinking specific indices, adjusting index template shard counts, configuring ILM rollover policies, or identifying unused indices safe for deletion—and in some cases apply these optimizations automatically. Connecting your cluster to Pulse's proactive monitoring helps prevent oversharding before it impacts query latency, increases recovery times, or causes heap memory pressure across your nodes.