The index.lifecycle.step.wait_time_threshold
setting is an Elasticsearch configuration that controls the maximum amount of time a step in an Index Lifecycle Management (ILM) policy can remain in the waiting state before it is considered failed.
- Default value: 24h (24 hours)
- Possible values: Time value (e.g., 30m, 12h, 7d)
- Recommendation: Adjust based on your specific use case and expected index lifecycle durations
This setting is crucial for preventing ILM steps from getting stuck indefinitely in a waiting state. If a step remains in the waiting state for longer than the specified threshold, it will be marked as failed, allowing the ILM process to move forward or take corrective actions.
This setting was introduced in Elasticsearch 7.0.0 and is available in all subsequent versions.
Example
To change the wait time threshold to 48 hours using the cluster settings API:
PUT _cluster/settings
{
"persistent": {
"index.lifecycle.step.wait_time_threshold": "48h"
}
}
You might want to increase this value if you have long-running operations or expect certain steps to take more time, such as when dealing with very large indices or during snapshot operations. Conversely, you might decrease it to detect and respond to issues more quickly in time-sensitive environments.
Common Issues or Misuses
- Setting the threshold too low, causing false failures for normal operations
- Setting the threshold too high, leading to delayed detection of actual issues
- Forgetting to account for this setting when troubleshooting ILM policy failures
Do's and Don'ts
Do's:
- Adjust the threshold based on your specific index lifecycle patterns and requirements
- Monitor ILM execution logs to identify if steps are frequently hitting this threshold
- Consider different thresholds for different index patterns if necessary
Don'ts:
- Don't set the threshold too low for large indices or time-consuming operations
- Don't ignore this setting when investigating ILM-related issues
- Don't set it to an extremely high value, as it may mask real problems in your ILM policies
Frequently Asked Questions
Q: How does this setting affect the overall ILM process?
A: This setting acts as a safeguard to prevent ILM steps from getting stuck indefinitely. If a step exceeds the specified wait time, it will be marked as failed, allowing the ILM process to take appropriate action, such as retrying the step or moving to the next phase.
Q: Can I set different thresholds for different indices?
A: The index.lifecycle.step.wait_time_threshold
is a cluster-level setting. While you can't set different thresholds for individual indices, you can adjust it dynamically based on your current operations or maintenance windows.
Q: What happens if a step fails due to exceeding this threshold?
A: When a step fails due to exceeding the wait time threshold, it will be logged in the Elasticsearch logs. The ILM process will then attempt to retry the step or move to the next phase, depending on the policy configuration and the nature of the step.
Q: How can I monitor if steps are approaching or exceeding this threshold?
A: You can use the ILM API to check the status of your indices and their current lifecycle steps. Additionally, monitoring Elasticsearch logs and setting up alerts for ILM-related events can help you proactively identify issues related to this threshold.
Q: Should I adjust this setting if I'm using snapshots in my ILM policy?
A: If your ILM policy includes snapshot operations, especially for large indices, you might want to consider increasing this threshold. Snapshot creation and restoration can take considerable time, and a higher threshold can prevent false failures during these operations.