The refresh_interval parameter in Elasticsearch index settings (index.refresh_interval
) controls how frequently the index is refreshed, making newly indexed documents available for search. This parameter directly impacts the trade-off between indexing throughput and near real-time search visibility.
Default value: “1s” (1 second)
Possible values: Time value (e.g., “1s”, “30s”, “5m”) or “-1” to disable automatic refreshing
Recommendations: Adjust based on your specific use case and requirements for search latency vs. indexing performance
Note that some parameters are optional and can be omitted depending on your requirements.
The refresh operation in Elasticsearch makes recent changes to the index visible to search. While a shorter refresh interval provides more up-to-date search results, it can impact indexing performance due to the overhead of frequent refreshes.
Example:
To change the refresh interval to 30 seconds for an index named “my_index”:
POST /my_index/_settings
{
"index" : {
"refresh_interval" : "30s"
}
}
Note: You can temporarily reduce the refresh interval to increase indexing speed during bulk operations, but frequent refreshes are resource intensive and can negatively affect cluster performance.
This change might be desirable in scenarios where you prioritize indexing throughput over immediate search visibility, such as bulk indexing operations or log ingestion. In these cases, data is first stored in a buffer in memory before being written to a new segment during a refresh. Each refresh operation creates a new segment, and the refresh is only complete when the segment is written to disk.
You can manually trigger a refresh using the refresh API by sending a POST request to the index endpoint. This request will process the refresh and make recent changes searchable immediately.
Note: Frequent manual refreshes can negatively impact cluster performance.
The Refresh Operation
The refresh operation is a fundamental process in both Elasticsearch and OpenSearch that ensures newly indexed data becomes visible to search requests. When documents are added or updated in an index, they are not immediately available for search; instead, they reside in memory until a refresh operation occurs. This operation takes the in-memory data and makes it searchable by creating new segments on disk, effectively updating the index so that recent changes are reflected in search results.
By default, the refresh operation is triggered automatically at intervals defined by the refresh interval setting in the index settings. This means that after each refresh interval, any data indexed since the last refresh becomes available for search. The refresh interval can be configured to suit different use cases, allowing you to balance the need for up-to-date search results with indexing performance. Whether you are working with Elasticsearch or OpenSearch, understanding and tuning the refresh operation through index settings is key to optimizing both data visibility and system efficiency.
Common Issues and Misuses of the Refresh Operation
Setting too short an interval can lead to performance degradation during heavy indexing loads
Setting too long an interval can result in delayed visibility of new documents in search results
Disabling refreshes entirely (-1) without manual management can lead to excessive memory usage
Do:
Adjust the refresh interval based on your specific use case and performance requirements
Monitor the impact of changes to this setting on both indexing and search performance
Consider using longer intervals for bulk indexing operations
Test the impact of different refresh_interval values to ensure search results and latency meet your requirements
Don’t:
Set extremely short intervals (e.g., milliseconds) without careful consideration of the performance impact
Forget to re-enable automatic refreshing after bulk operations if you’ve disabled it
Ignore this setting when troubleshooting performance issues related to indexing or search latency
Lose control over refresh behavior by failing to monitor system performance after changes
Frequently Asked Questions
Q: How does index.refresh_interval affect indexing performance?
A: A longer refresh interval can improve indexing performance by reducing the frequency of refresh operations, allowing more documents to be indexed between refreshes.
Q: Can I change the refresh_interval for an existing index?
A: Yes, you can dynamically update the refresh_interval for an existing index using the Update Index Settings API.
Q: What happens if I set index.refresh_interval to -1?
A: Setting it to -1 disables automatic refreshing. New documents will not be visible in search results until a manual refresh is performed or the index is closed and reopened.
Q: How does index.refresh_interval relate to near real-time search?
A: The refresh interval determines the maximum delay between when a document is indexed and when it becomes visible in search results, directly impacting the “near real-time” aspect of Elasticsearch.
Q: Should I use different refresh intervals for different indices?
A: Yes, you can and often should use different refresh intervals for different indices based on their specific use cases, indexing rates, and search latency requirements. Additionally, to solve common issues with refresh_interval, it is important to control and monitor the system's behavior after making changes. Test different settings to see how they affect searching performance and recall. For more details on advanced configuration and troubleshooting, refer to the official documentation. This will help you optimize your setup and address any unexpected behavior related to refresh and search operations.