Having too many shards in an Elasticsearch cluster (oversharding) causes significant performance degradation and resource waste. This guide helps you identify oversharding problems and implement effective solutions.
Understanding the Problem
Why Too Many Shards Hurts Performance
- Memory overhead: Each shard consumes heap memory for metadata, segments, and caches
- Coordination cost: Queries must coordinate across all shards
- Cluster state size: More shards = larger cluster state to manage
- Recovery time: Many shards slow down node recovery
- File handles: Each shard opens multiple file handles
How Many Shards Is Too Many?
There's no absolute number, but consider these guidelines:
- Per node: Avoid exceeding 1,000 shards per node
- Per GB heap: The old "20 shards per GB heap" rule is deprecated but still directionally useful
- Per index: Question if you need more than a few shards per index
- Shard size: Shards under 1 GB are often too small
Identifying Oversharding
Check Total Shard Count
GET /_cluster/stats?filter_path=indices.shards.total
GET /_cat/shards?v | wc -l
Shards Per Node
GET /_cat/allocation?v
Average Shard Size
# Get shard sizes
GET /_cat/shards?v&h=index,shard,prirep,store&s=store:asc
If many shards are under 1 GB, you likely have oversharding.
Indices with Many Small Shards
GET /_cat/indices?v&h=index,pri,rep,store.size,pri.store.size&s=pri:desc
Symptoms of Oversharding
Performance Symptoms
- Slow search queries despite low data volume
- High memory usage with low data volume
- Long cluster recovery times
- Master node CPU spikes during state updates
Observable Metrics
| Metric | Healthy | Oversharded |
|---|---|---|
| Shards per node | < 500 | > 1000 |
| Average shard size | 10-50 GB | < 1 GB |
| Cluster state size | < 100 MB | > 500 MB |
| Heap per shard | Sufficient | High pressure |
Solutions
Solution 1: Use Index Lifecycle Management (ILM)
Configure ILM to control shard growth:
PUT _ilm/policy/controlled_shards
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_primary_shard_size": "40gb",
"max_age": "7d"
}
}
},
"warm": {
"min_age": "30d",
"actions": {
"shrink": {
"number_of_shards": 1
}
}
},
"delete": {
"min_age": "90d",
"actions": {
"delete": {}
}
}
}
}
}
Solution 2: Shrink Existing Indices
Reduce shard count on existing indices:
// Step 1: Block writes and relocate to single node
PUT /my-index/_settings
{
"settings": {
"index.blocks.write": true,
"index.routing.allocation.require._name": "shrink-node"
}
}
// Step 2: Wait for relocation
GET /_cat/recovery/my-index
// Step 3: Shrink
POST /my-index/_shrink/my-index-shrunk
{
"settings": {
"index.number_of_shards": 1,
"index.number_of_replicas": 1,
"index.blocks.write": null,
"index.routing.allocation.require._name": null
}
}
Solution 3: Reindex with Fewer Shards
For major restructuring:
// Create new index with fewer shards
PUT /my-index-reindexed
{
"settings": {
"number_of_shards": 3, // Instead of 30
"number_of_replicas": 1
}
}
// Reindex data
POST /_reindex
{
"source": {
"index": "my-index"
},
"dest": {
"index": "my-index-reindexed"
}
}
Solution 4: Delete Unnecessary Indices
Remove old, unused indices:
GET /_cat/indices?v&h=index,docs.count,store.size&s=docs.count:asc
DELETE /old-unused-index-*
Solution 5: Use Index Templates with Proper Defaults
Prevent future oversharding:
PUT _index_template/optimized_template
{
"index_patterns": ["logs-*"],
"template": {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
}
},
"priority": 200
}
Solution 6: Use Data Streams
For time-series data, use data streams with ILM:
PUT _index_template/logs_template
{
"index_patterns": ["logs-*"],
"data_stream": {},
"template": {
"settings": {
"number_of_shards": 1
}
}
}
PUT _data_stream/logs-app
Calculating Ideal Shard Count
Formula for Time-Series Data
Daily data size / Target shard size = Shards per day
Example:
100 GB/day / 40 GB target = 2-3 shards per day
Formula for Static Data
Total data size / Target shard size = Total primary shards
Example:
500 GB total / 40 GB target = ~13 primary shards
Monitoring Shard Health
Regular Checks
# Total shards
GET /_cluster/health?filter_path=active_shards
# Shards per node
GET /_cat/allocation?v
# Small shards (potential oversharding)
GET /_cat/shards?v&h=index,shard,store&s=store:asc | head -50
Alert Conditions
- Total shard count > (node_count * 500)
- Average shard size < 5 GB
- Cluster state size growing rapidly
Prevention Checklist
- Index templates specify appropriate shard counts
- ILM policies control shard growth with rollover
- Data streams used for time-series data
- Regular review of shard distribution
- Deletion policies for old indices
- Target shard size: 10-50 GB