What is Elasticsearch Fielddata?

What is fielddata?

Fielddata is a data structure in Elasticsearch that allows for fast access to field values for certain operations, particularly sorting and aggregations on text fields. It is loaded into memory on-demand and remains there for the lifetime of a segment. Fielddata is primarily used for text fields, as other field types typically use doc values for similar purposes.

Best practices

Avoid using fielddata on large text fields whenever possible.
Use doc values instead of fielddata for keyword fields and numeric fields.
Monitor fielddata usage and set appropriate circuit breaker limits.
Consider using the fielddata_frequency_filter to load only frequently occurring terms into fielddata.
Use the fields mapping to create a keyword sub-field for sorting and aggregations instead of enabling fielddata on the main text field.

Common issues or misuses

Enabling fielddata on large text fields can lead to significant memory usage and potential out-of-memory errors.
Overuse of fielddata can result in slow query performance due to the time required to load data into memory.
Failing to set appropriate circuit breaker limits can lead to cluster instability.
Enabling fielddata on fields that don't require it, wasting memory resources.
Using fielddata for operations that could be more efficiently handled by doc values or other optimizations.

Additional relevant information

Fielddata is disabled by default for text fields due to its potential for high memory usage.
The fielddata can be enabled on a per-field basis in the mapping.
Elasticsearch provides circuit breakers to prevent fielddata from consuming too much memory and causing out-of-memory errors.
Fielddata is segment-based, meaning it's built for each segment in an index and is invalidated when segments are merged.

Frequently Asked Questions

Q: How do I enable fielddata for a text field in Elasticsearch?
A: You can enable fielddata for a text field by updating the mapping with the following:

PUT my-index/_mapping
{
  "properties": {
    "my_field": {
      "type": "text",
      "fielddata": true
    }
  }
}

Q: What's the difference between fielddata and doc values?
A: Fielddata is loaded into memory on-demand and used primarily for text fields, while doc values are stored on disk and used for most other field types. Doc values are generally more efficient and don't have the same memory concerns as fielddata.

Q: How can I monitor fielddata usage in my Elasticsearch cluster?
A: You can use the _nodes/stats API to check fielddata memory usage across your cluster. The /_cat/fielddata API also provides a convenient way to view fielddata usage per field.

Q: What is the fielddata circuit breaker, and how does it work?
A: The fielddata circuit breaker is a safeguard that estimates the memory requirements for loading fielddata and prevents operations that would exceed a certain threshold. It helps prevent out-of-memory errors by rejecting requests that would use too much memory for fielddata.

Q: Are there alternatives to using fielddata for sorting and aggregations on text fields?
A: Yes, alternatives include:

Using a keyword sub-field for exact matching and sorting.
Using the fields mapping to create a separate field optimized for sorting and aggregations.
Using search_as_you_type field type for prefix matching and sorting.
Considering if the field really needs to be analyzed, and if not, using a keyword field instead.