The index.number_of_shards
setting in Elasticsearch controls the number of primary shards for an index. This setting is crucial for determining how data is distributed and scaled across the cluster.
- Default Value: 1 (as of Elasticsearch 7.0; previous to that was 5)
- Possible Values: Any positive integer
- Recommendations:
- For most use cases, start with 1 shard and increase only if necessary
- Consider your data volume, query patterns, and cluster size when choosing the number of shards
- Avoid over-sharding, as it can lead to performance issues
Example
To create an index with 3 primary shards:
PUT /my_index
{
"settings": {
"index": {
"number_of_shards": 3
}
}
}
Reasons for changing the default:
- To distribute large indices across multiple nodes
- To increase write throughput by allowing parallel indexing
Effects of the change:
- Improved write performance for large datasets
- Better distribution of data across the cluster
- Potential for faster search on large indices when querying multiple shards in parallel
Common Issues and Misuses
- Over-sharding: Creating too many shards for small indices, leading to resource waste
- Under-sharding: Not enough shards for large indices, causing poor performance and scalability issues
- Changing number of shards after index creation, which is not possible and requires reindexing
Do's and Don'ts
Do's:
- Plan your shard strategy based on expected data growth
- Monitor shard size and adjust for new indices if necessary
- Use a single shard for small indices (< 50GB)
Don'ts:
- Don't create an excessive number of shards for small datasets
- Avoid changing the number of shards frequently
- Don't ignore this setting and always rely on the default
Frequently Asked Questions
Q: Can I change the number of shards after creating an index?
A: No, the number of primary shards cannot be changed after index creation. You would need to reindex your data into a new index with the desired number of shards.
Q: How do I determine the optimal number of shards for my index?
A: Consider factors such as the size of your data, expected growth, query patterns, and available hardware. A general rule is to aim for shard sizes between 10GB and 50GB.
Q: Does increasing the number of shards always improve performance?
A: Not necessarily. While more shards can improve write performance and allow for better distribution, too many shards can lead to overhead and decreased performance, especially for search operations.
Q: How does the number of shards affect cluster stability?
A: Having too many shards can strain cluster resources, particularly memory, and may lead to slower cluster state updates. It's important to balance the number of shards with your cluster's capacity.
Q: Is there a maximum limit to the number of shards I can have?
A: While there's no hard limit, Elasticsearch recommends keeping the number of shards per node below 1000. The total number of shards in the cluster should be considered in relation to available resources and performance requirements.