Elasticsearch Too Many Shards Performance Issues

Having too many shards in an Elasticsearch cluster (oversharding) causes significant performance degradation and resource waste. This guide helps you identify oversharding problems and implement effective solutions.

Understanding the Problem

Why Too Many Shards Hurts Performance

Memory overhead: Each shard consumes heap memory for metadata, segments, and caches
Coordination cost: Queries must coordinate across all shards
Cluster state size: More shards = larger cluster state to manage
Recovery time: Many shards slow down node recovery
File handles: Each shard opens multiple file handles

How Many Shards Is Too Many?

There's no absolute number, but consider these guidelines:

Per node: Avoid exceeding 1,000 shards per node
Per GB heap: The old "20 shards per GB heap" rule is deprecated but still directionally useful
Per index: Question if you need more than a few shards per index
Shard size: Shards under 1 GB are often too small

Identifying Oversharding

Check Total Shard Count

GET /_cluster/stats?filter_path=indices.shards.total
GET /_cat/shards?v | wc -l

Shards Per Node

GET /_cat/allocation?v

Average Shard Size

# Get shard sizes
GET /_cat/shards?v&h=index,shard,prirep,store&s=store:asc

If many shards are under 1 GB, you likely have oversharding.

Indices with Many Small Shards

GET /_cat/indices?v&h=index,pri,rep,store.size,pri.store.size&s=pri:desc

Symptoms of Oversharding

Performance Symptoms

Slow search queries despite low data volume
High memory usage with low data volume
Long cluster recovery times
Master node CPU spikes during state updates

Observable Metrics

Metric	Healthy	Oversharded
Shards per node	< 500	> 1000
Average shard size	10-50 GB	< 1 GB
Cluster state size	< 100 MB	> 500 MB
Heap per shard	Sufficient	High pressure

Solutions

Solution 1: Use Index Lifecycle Management (ILM)

Configure ILM to control shard growth:

PUT _ilm/policy/controlled_shards
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "40gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Solution 2: Shrink Existing Indices

Reduce shard count on existing indices:

// Step 1: Block writes and relocate to single node
PUT /my-index/_settings
{
  "settings": {
    "index.blocks.write": true,
    "index.routing.allocation.require._name": "shrink-node"
  }
}

// Step 2: Wait for relocation
GET /_cat/recovery/my-index

// Step 3: Shrink
POST /my-index/_shrink/my-index-shrunk
{
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 1,
    "index.blocks.write": null,
    "index.routing.allocation.require._name": null
  }
}

Solution 3: Reindex with Fewer Shards

For major restructuring:

// Create new index with fewer shards
PUT /my-index-reindexed
{
  "settings": {
    "number_of_shards": 3,  // Instead of 30
    "number_of_replicas": 1
  }
}

// Reindex data
POST /_reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-index-reindexed"
  }
}

Solution 4: Delete Unnecessary Indices

Remove old, unused indices:

GET /_cat/indices?v&h=index,docs.count,store.size&s=docs.count:asc

DELETE /old-unused-index-*

Solution 5: Use Index Templates with Proper Defaults

Prevent future oversharding:

PUT _index_template/optimized_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "priority": 200
}

Solution 6: Use Data Streams

For time-series data, use data streams with ILM:

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "number_of_shards": 1
    }
  }
}

PUT _data_stream/logs-app

Calculating Ideal Shard Count

Formula for Time-Series Data

Daily data size / Target shard size = Shards per day

Example:
100 GB/day / 40 GB target = 2-3 shards per day

Formula for Static Data

Total data size / Target shard size = Total primary shards

Example:
500 GB total / 40 GB target = ~13 primary shards

Monitoring Shard Health

Regular Checks

# Total shards
GET /_cluster/health?filter_path=active_shards

# Shards per node
GET /_cat/allocation?v

# Small shards (potential oversharding)
GET /_cat/shards?v&h=index,shard,store&s=store:asc | head -50

Alert Conditions

Total shard count > (node_count * 500)
Average shard size < 5 GB
Cluster state size growing rapidly

Prevention Checklist

Index templates specify appropriate shard counts
ILM policies control shard growth with rollover
Data streams used for time-series data
Regular review of shard distribution
Deletion policies for old indices
Target shard size: 10-50 GB