Elasticsearch indices.fielddata.cache.size Setting

The indices.fielddata.cache.size setting in Elasticsearch controls the maximum amount of heap memory that can be used for the fielddata cache. This cache stores field values in memory for fast access during certain types of queries, particularly for aggregations and sorting on text fields.

Default value: Unbounded (not set)
Possible values: Percentage of heap size (e.g., "30%") or absolute value (e.g., "2gb")
Recommendations: Set to a percentage of heap, typically between 20% to 40%, based on your use case and available memory

The fielddata cache is used to load field values into memory for fast access. When not set, it can potentially use all available heap memory, which may lead to out-of-memory errors. Setting a limit helps prevent excessive memory usage and potential stability issues.

Example

To set the fielddata cache size to 30% of the heap:

indices.fielddata.cache.size: 30%

You might want to change this setting if you're experiencing out-of-memory errors or if you need to allocate more memory for other operations. Increasing the size can improve performance for queries that heavily rely on fielddata, while decreasing it can free up memory for other operations.

Common Issues and Misuses

Setting the value too high can lead to excessive memory usage and potential out-of-memory errors
Setting the value too low can result in frequent evictions and reloading of fielddata, impacting query performance
Not monitoring fielddata usage and adjusting the setting accordingly

Do's and Don'ts

Do's:

Monitor fielddata usage using Elasticsearch monitoring tools
Adjust the setting based on your specific use case and available resources
Consider using doc values instead of fielddata for supported field types

Don'ts:

Don't leave the setting unbounded in production environments
Don't set the value too high without considering the impact on other heap consumers
Don't ignore fielddata circuit breaker errors; they often indicate that this setting needs adjustment

Frequently Asked Questions

Q: How does indices.fielddata.cache.size affect query performance?
A: A larger cache size can improve performance for queries that rely heavily on fielddata, such as aggregations and sorting on text fields. However, setting it too high can lead to longer garbage collection pauses and potential out-of-memory errors.

Q: Can I change indices.fielddata.cache.size dynamically?
A: Yes, this setting can be changed dynamically using the cluster update settings API. However, changes will only affect new fielddata loading and won't immediately clear existing fielddata.

Q: How do I know if I need to adjust the fielddata cache size?
A: Monitor your cluster's memory usage and watch for fielddata circuit breaker errors. If you're seeing frequent evictions or out-of-memory errors, you may need to adjust this setting.

Q: What's the relationship between indices.fielddata.cache.size and the fielddata circuit breaker?
A: The fielddata circuit breaker prevents fielddata from consuming more memory than allowed. While indices.fielddata.cache.size sets a soft limit, the circuit breaker provides a hard limit to prevent out-of-memory errors.

Q: Should I use fielddata or doc values for my use case?
A: For most use cases, doc values are preferred as they're more memory-efficient and don't require loading data into memory. Use fielddata only for text fields that require sorting or aggregations and cannot use doc values.