Field Data Cache in Elasticsearch

Pulse - Elasticsearch Operations Done Right

On this page

What is field data cache? Best practices Common issues or misuses Frequently Asked Questions

What is field data cache?

Field data cache is a memory structure in Elasticsearch used to store field values for fast access during certain operations, particularly for sorting and aggregating on text fields. It loads the entire field's values into memory, which can significantly improve query performance but may also consume substantial amounts of heap memory.

Field data is loaded on-demand and remains in memory until it's evicted. This can lead to sudden spikes in memory usage when queries require field data for fields that haven't been loaded before. Elasticsearch provides circuit breakers to prevent field data from causing out-of-memory errors, but proper monitoring and configuration are crucial.

Best practices

  1. Avoid using field data on text fields whenever possible. Instead, use keyword fields for sorting and aggregations.
  2. Set appropriate circuit breaker limits to prevent out-of-memory errors.
  3. Monitor field data usage and adjust cache settings as needed.
  4. Use the fielddata.frequency_filter to load only frequently occurring terms into memory.
  5. Consider using doc values for fields that require sorting or aggregations.

Common issues or misuses

  1. Enabling field data on large text fields without proper consideration, leading to excessive memory usage.
  2. Not setting appropriate circuit breaker limits, risking cluster stability.
  3. Overusing field data when alternatives like doc values or keyword fields are more suitable.
  4. Neglecting to monitor and optimize field data usage, resulting in performance degradation.

Frequently Asked Questions

Q: How does field data cache differ from doc values?
A: Field data cache loads all values into memory, while doc values store data on disk and load only what's needed. Doc values are generally more efficient for sorting and aggregations on keyword and numeric fields.

Q: Can I disable field data cache completely?
A: While you can't disable it completely, you can prevent its use on specific fields by setting fielddata: false in the mapping. It's generally better to use alternatives like doc values when possible.

Q: How can I monitor field data usage in my Elasticsearch cluster?
A: You can use the _nodes/stats API to check field data memory usage, or use monitoring tools like Kibana to visualize field data metrics over time.

Q: What's the relationship between field data and the circuit breaker?
A: The field data circuit breaker estimates the memory requirements for loading field data and prevents operations that would exceed the configured limits, helping to prevent out-of-memory errors.

Q: Is field data cache cleared when documents are updated or deleted?
A: Field data cache is not automatically cleared on document updates or deletions. It's cleared when the cache is evicted due to memory pressure or when indices are closed or deleted.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.