Having too many Elasticsearch shards for index - Common Causes & Fixes

Pulse - Elasticsearch Operations Done Right

On this page

Impact Resolution and Best Practices Additional Information Frequently Asked Questions

When an index in Elasticsearch has too many shards there will be performance and cost issues associated with it. This typically happens when:

  1. Index templates create too many shards by default
  2. Improper index design leads to a high number of small indices
  3. Time-based indices accumulate without proper management
  4. Cluster scaling is not aligned with data growth
  5. Wrong implementation of multi-tenancy in the cluster

Impact

Having too many shards can significantly impact Elasticsearch performance:

  • Increased memory usage and CPU overhead
  • Slower query performance
  • Reduced cluster stability
  • Longer recovery times during node failures
  • Difficulty in managing and maintaining the cluster

Resolution and Best Practices

To resolve and prevent the "too many shards" error:

  1. Optimize index templates:

    • Reduce the default number of shards per index
    • Use index aliases and rollover API for time-based data
  2. Consolidate small indices:

    • Merge small indices into larger ones
    • Use parent-child relationships or nested documents instead of separate indices
  3. Implement index lifecycle management:

    • Set up index lifecycle policies to manage time-based indices
    • Automate the process of moving older indices to warm or cold storage
    • Move to using data streams or rollover indices instead of daily indices.
  4. Right-size your cluster:

    • Plan for data growth and adjust the number of nodes accordingly
    • Use larger nodes with more resources instead of many small nodes
  5. Monitor shard distribution:

    • Use Pulse's shard heatmap to correctly plan index and shard distribution and avoid hotspots
    • Set up alerts in Pulse for when shard counts approach critical levels, or indices created with incorrect number of shards

Additional Information

  • The ideal shard size is typically between 10GB and 50GB (but no one size fits all)
  • Aim for no more than 1000-2000 shards per node
  • Consider using custom routing to reduce the number of shards queried

Frequently Asked Questions

Q: How many shards should an Elasticsearch index have?
A: The optimal number of shards depends on your data size and use case. As a general rule, aim for shard sizes between 10GB and 50GB. For most use cases, starting with 1 or 3 primary shards is a good baseline.

Q: Can I change the number of shards in an existing index?
A: You cannot directly change the number of primary shards in an existing index. However, you can use the Split or Shrink APIs to create a new index with a different number of shards based on an existing index.

Q: How do I reduce the number of shards in my cluster?
A: To reduce the number of shards, you can:

  1. Merge small indices
  2. Use the Shrink API to reduce shards in large indices
  3. Adjust index templates for future indices
  4. Implement index lifecycle management to remove or archive old indices

Q: What's the difference between primary and replica shards?
A: Primary shards are the main shards that hold your data. Replica shards are copies of primary shards, used for redundancy and to improve read performance. While increasing replica shards can improve read performance, it doesn't solve the "too many shards" problem.

Q: How does the number of shards affect Elasticsearch performance?
A: Each shard requires memory and CPU resources. Too many shards can lead to increased overhead, slower query performance, and longer cluster state updates. Conversely, too few shards can limit parallelization and scalability. Finding the right balance is key to optimal Elasticsearch performance.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.