Meet the Pulse team at AWS re:Invent!

Read more

Elasticsearch Too Many Shards Performance Issues

Having too many shards in an Elasticsearch cluster (oversharding) causes significant performance degradation and resource waste. This guide helps you identify oversharding problems and implement effective solutions.

Understanding the Problem

Why Too Many Shards Hurts Performance

  1. Memory overhead: Each shard consumes heap memory for metadata, segments, and caches
  2. Coordination cost: Queries must coordinate across all shards
  3. Cluster state size: More shards = larger cluster state to manage
  4. Recovery time: Many shards slow down node recovery
  5. File handles: Each shard opens multiple file handles

How Many Shards Is Too Many?

There's no absolute number, but consider these guidelines:

  • Per node: Avoid exceeding 1,000 shards per node
  • Per GB heap: The old "20 shards per GB heap" rule is deprecated but still directionally useful
  • Per index: Question if you need more than a few shards per index
  • Shard size: Shards under 1 GB are often too small

Identifying Oversharding

Check Total Shard Count

GET /_cluster/stats?filter_path=indices.shards.total
GET /_cat/shards?v | wc -l

Shards Per Node

GET /_cat/allocation?v

Average Shard Size

# Get shard sizes
GET /_cat/shards?v&h=index,shard,prirep,store&s=store:asc

If many shards are under 1 GB, you likely have oversharding.

Indices with Many Small Shards

GET /_cat/indices?v&h=index,pri,rep,store.size,pri.store.size&s=pri:desc

Symptoms of Oversharding

Performance Symptoms

  • Slow search queries despite low data volume
  • High memory usage with low data volume
  • Long cluster recovery times
  • Master node CPU spikes during state updates

Observable Metrics

Metric Healthy Oversharded
Shards per node < 500 > 1000
Average shard size 10-50 GB < 1 GB
Cluster state size < 100 MB > 500 MB
Heap per shard Sufficient High pressure

Solutions

Solution 1: Use Index Lifecycle Management (ILM)

Configure ILM to control shard growth:

PUT _ilm/policy/controlled_shards
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_primary_shard_size": "40gb",
            "max_age": "7d"
          }
        }
      },
      "warm": {
        "min_age": "30d",
        "actions": {
          "shrink": {
            "number_of_shards": 1
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Solution 2: Shrink Existing Indices

Reduce shard count on existing indices:

// Step 1: Block writes and relocate to single node
PUT /my-index/_settings
{
  "settings": {
    "index.blocks.write": true,
    "index.routing.allocation.require._name": "shrink-node"
  }
}

// Step 2: Wait for relocation
GET /_cat/recovery/my-index

// Step 3: Shrink
POST /my-index/_shrink/my-index-shrunk
{
  "settings": {
    "index.number_of_shards": 1,
    "index.number_of_replicas": 1,
    "index.blocks.write": null,
    "index.routing.allocation.require._name": null
  }
}

Solution 3: Reindex with Fewer Shards

For major restructuring:

// Create new index with fewer shards
PUT /my-index-reindexed
{
  "settings": {
    "number_of_shards": 3,  // Instead of 30
    "number_of_replicas": 1
  }
}

// Reindex data
POST /_reindex
{
  "source": {
    "index": "my-index"
  },
  "dest": {
    "index": "my-index-reindexed"
  }
}

Solution 4: Delete Unnecessary Indices

Remove old, unused indices:

GET /_cat/indices?v&h=index,docs.count,store.size&s=docs.count:asc

DELETE /old-unused-index-*

Solution 5: Use Index Templates with Proper Defaults

Prevent future oversharding:

PUT _index_template/optimized_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "priority": 200
}

Solution 6: Use Data Streams

For time-series data, use data streams with ILM:

PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "data_stream": {},
  "template": {
    "settings": {
      "number_of_shards": 1
    }
  }
}

PUT _data_stream/logs-app

Calculating Ideal Shard Count

Formula for Time-Series Data

Daily data size / Target shard size = Shards per day

Example:
100 GB/day / 40 GB target = 2-3 shards per day

Formula for Static Data

Total data size / Target shard size = Total primary shards

Example:
500 GB total / 40 GB target = ~13 primary shards

Monitoring Shard Health

Regular Checks

# Total shards
GET /_cluster/health?filter_path=active_shards

# Shards per node
GET /_cat/allocation?v

# Small shards (potential oversharding)
GET /_cat/shards?v&h=index,shard,store&s=store:asc | head -50

Alert Conditions

  • Total shard count > (node_count * 500)
  • Average shard size < 5 GB
  • Cluster state size growing rapidly

Prevention Checklist

  • Index templates specify appropriate shard counts
  • ILM policies control shard growth with rollover
  • Data streams used for time-series data
  • Regular review of shard distribution
  • Deletion policies for old indices
  • Target shard size: 10-50 GB
Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.