Elasticsearch index.lifecycle.step.wait_time

The index.lifecycle.step.wait_time_threshold setting is an Elasticsearch configuration that controls the maximum amount of time a step in an Index Lifecycle Management (ILM) policy can remain in the waiting state before it is considered failed.

Default value: 24h (24 hours)
Possible values: Time value (e.g., 30m, 12h, 7d)
Recommendation: Adjust based on your specific use case and expected index lifecycle durations

This setting is crucial for preventing ILM steps from getting stuck indefinitely in a waiting state. If a step remains in the waiting state for longer than the specified threshold, it will be marked as failed, allowing the ILM process to move forward or take corrective actions.

This setting was introduced in Elasticsearch 7.0.0 and is available in all subsequent versions.

Example

To change the wait time threshold to 48 hours using the cluster settings API:

PUT _cluster/settings
{
  "persistent": {
    "index.lifecycle.step.wait_time_threshold": "48h"
  }
}

You might want to increase this value if you have long-running operations or expect certain steps to take more time, such as when dealing with very large indices or during snapshot operations. Conversely, you might decrease it to detect and respond to issues more quickly in time-sensitive environments.

Common Issues or Misuses

Setting the threshold too low, causing false failures for normal operations
Setting the threshold too high, leading to delayed detection of actual issues
Forgetting to account for this setting when troubleshooting ILM policy failures

Do's and Don'ts

Do's:

Adjust the threshold based on your specific index lifecycle patterns and requirements
Monitor ILM execution logs to identify if steps are frequently hitting this threshold
Consider different thresholds for different index patterns if necessary

Don'ts:

Don't set the threshold too low for large indices or time-consuming operations
Don't ignore this setting when investigating ILM-related issues
Don't set it to an extremely high value, as it may mask real problems in your ILM policies

Frequently Asked Questions

Q: How does this setting affect the overall ILM process?
A: This setting acts as a safeguard to prevent ILM steps from getting stuck indefinitely. If a step exceeds the specified wait time, it will be marked as failed, allowing the ILM process to take appropriate action, such as retrying the step or moving to the next phase.

Q: Can I set different thresholds for different indices?
A: The index.lifecycle.step.wait_time_threshold is a cluster-level setting. While you can't set different thresholds for individual indices, you can adjust it dynamically based on your current operations or maintenance windows.

Q: What happens if a step fails due to exceeding this threshold?
A: When a step fails due to exceeding the wait time threshold, it will be logged in the Elasticsearch logs. The ILM process will then attempt to retry the step or move to the next phase, depending on the policy configuration and the nature of the step.

Q: How can I monitor if steps are approaching or exceeding this threshold?
A: You can use the ILM API to check the status of your indices and their current lifecycle steps. Additionally, monitoring Elasticsearch logs and setting up alerts for ILM-related events can help you proactively identify issues related to this threshold.

Q: Should I adjust this setting if I'm using snapshots in my ILM policy?
A: If your ILM policy includes snapshot operations, especially for large indices, you might want to consider increasing this threshold. Snapshot creation and restoration can take considerable time, and a higher threshold can prevent false failures during these operations.

Elasticsearch index.lifecycle.step.wait_time_threshold Setting

Example

Common Issues or Misuses

Do's and Don'ts

Frequently Asked Questions