The Elasticsearch fuzzy query matches documents containing terms within a configurable Levenshtein edit distance of the input. Edit distance counts character insertions, deletions, substitutions, and (when transpositions: true) adjacent-swaps. The query operates on a single token against keyword and text fields; it does not analyze the input. Use fuzzy when typo tolerance is needed on a specific field and the full match machinery is overkill.
Syntax
{
"query": {
"fuzzy": {
"<field>": {
"value": "<term>",
"fuzziness": "AUTO",
"max_expansions": 50,
"prefix_length": 0,
"transpositions": true,
"rewrite": "constant_score"
}
}
}
}
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
value |
string | - | Term to match (single token). Required. |
fuzziness |
AUTO | AUTO:lo,hi | 0 | 1 | 2 |
AUTO |
Maximum Levenshtein edit distance. |
max_expansions |
int | 50 | Max number of distinct indexed terms enumerated. |
prefix_length |
int | 0 | Leading characters that must match exactly. |
transpositions |
bool | true | If true, an adjacent character swap counts as one edit. |
rewrite |
string | constant_score |
Multi-term rewrite strategy. |
Fuzziness AUTO
AUTO chooses an edit distance based on term length:
| Term length (chars) | AUTO fuzziness |
|---|---|
| 0-2 | 0 (exact match) |
| 3-5 | 1 |
| 6+ | 2 |
AUTO:3,6 overrides the boundaries (here: lengths 0-2 → 0 edits, 3-5 → 1, 6+ → 2). The default boundaries are 3 and 6.
Examples
Default AUTO fuzziness:
GET /products/_search
{
"query": {
"fuzzy": {
"product_name": { "value": "labtop", "fuzziness": "AUTO" }
}
}
}
Bound the cost: 2-character exact prefix, max 10 expansions:
GET /articles/_search
{
"query": {
"fuzzy": {
"title": {
"value": "elastcsearch",
"fuzziness": "AUTO",
"prefix_length": 2,
"max_expansions": 10
}
}
}
}
Disable transpositions for stricter matching:
GET /names/_search
{
"query": {
"fuzzy": {
"last_name": {
"value": "ahmad",
"fuzziness": 1,
"transpositions": false
}
}
}
}
Combine with other clauses:
GET /products/_search
{
"query": {
"bool": {
"must": [
{ "fuzzy": { "name": { "value": "labtop", "fuzziness": "AUTO" } } },
{ "term": { "category": "electronics" } }
]
}
}
}
Performance and Use Notes
The fuzzy query enumerates indexed terms within edit distance of the input using a Levenshtein automaton, then OR-combines them. Cost depends on the number of expansions enumerated. prefix_length is the most effective lever: requiring a 2-character exact prefix eliminates the vast majority of unrelated terms because the automaton can prune entire branches of the term dictionary. max_expansions caps the absolute number of variants considered (default 50).
fuzziness: 2 on short fields is dangerous - a 4-character term with two edits matches an enormous fraction of the term dictionary. For user-facing search, prefer fuzziness: AUTO with prefix_length: 1 or 2. On a field that needs typo tolerance regularly, consider an n-gram analyzer at index time instead of fuzzy queries at search time; the trade-off is index size for query speed.
Unbounded fuzzy queries are a recurring source of cluster CPU spikes. Manually scanning slow logs for high-expansion fuzzy patterns and deciding which to bound vs. move to index-time n-grams is exactly the optimization loop Pulse runs continuously.
Common Mistakes
- Using
fuzziness: "AUTO"on terms ≤ 2 characters and being surprised that fuzzy matching does nothing - AUTO maps short terms to 0 edits. - Leaving
prefix_length: 0on a public search endpoint; users typing a 1-character query enumerate massive term lists. - Running
fuzzyagainst an analyzedtextfield with stemming - the indexed terms are stems, so edit distance is calculated on stems, not on the words the user typed. - Expecting
fuzziness: 2to find "color" from "colour" - that is a 1-edit insertion (or deletion), sofuzziness: 1is sufficient and far cheaper. - Pairing
fuzzywith phrase semantics -fuzzyis single-term; use match query withfuzzinessfor multi-token input.
Find Slow Fuzzy Queries with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For fuzzy queries specifically, Pulse:
- Identifies fuzzy queries running with
prefix_length: 0and highmax_expansionsagainst large keyword fields - the combination that lets Levenshtein automatons enumerate huge slices of the term dictionary - Flags
fuzziness: 2on short indexed terms where the candidate set explodes, andfuzziness: AUTOon fields with stems where edit distance is being measured against the wrong tokens - Traces each slow fuzzy query back to the calling service via slow-log and APM correlation
- Recommends concrete fixes: raise
prefix_lengthto 1 or 2, lowermax_expansionsto 10-25, or replace query-time fuzziness with an index-time n-gram or shingle analyzer - Tracks latency and CPU improvement after the change ships
This turns the manual slow-log plus DSL-debugging loop into a continuous optimization workflow.
Frequently Asked Questions
Q: How does fuzziness AUTO work in an Elasticsearch fuzzy query?
A: AUTO chooses the max Levenshtein edit distance based on term length: 0 for terms ≤ 2 characters, 1 for 3-5 characters, 2 for 6+ characters. The boundaries are configurable via AUTO:lo,hi (default AUTO:3,6).
Q: What is the difference between fuzzy query and wildcard query in Elasticsearch?
A: The fuzzy query matches terms within an edit-distance threshold (typos, missing letters). The wildcard query matches a pattern with * and ? wildcards. Use fuzzy for spelling tolerance, wildcard for known partial patterns.
Q: How does prefix_length affect fuzzy query performance?
A: prefix_length requires the first N characters to match exactly. Each additional prefix character prunes the term dictionary traversal significantly. Going from prefix_length: 0 to prefix_length: 2 typically cuts query cost by an order of magnitude on a large index.
Q: Does the fuzzy query analyze its input?
A: No. The fuzzy query treats value as a single, unanalyzed term. For multi-word fuzzy matching, use the match query with the fuzziness parameter, which analyzes the input into tokens and applies fuzziness per token.
Q: Can the fuzzy query be used on numeric or date fields?
A: Technically yes, but the result is not meaningful - Levenshtein distance on numeric tokens is rarely what you want. Use the range query for numeric or date tolerance.
Q: How do I bound the cost of fuzzy queries in production?
A: Set prefix_length to at least 1 (preferably 2), cap max_expansions (10-25 is typical), and avoid fuzziness: 2 on short indexed terms. For aggressive typo tolerance, index n-grams or shingles instead of relying on query-time fuzziness.
Q: What is the best tool to find slow fuzzy queries in production Elasticsearch?
A: Pulse profiles Elasticsearch and OpenSearch slow logs, identifies fuzzy queries with prefix_length: 0 and high max_expansions, attributes each to the calling service, and recommends a bounded rewrite or an index-time n-gram alternative.
Related Reading
- Elasticsearch Match Query: multi-token full-text query with built-in fuzziness.
- Elasticsearch Wildcard Query: pattern matching with
*and?. - Elasticsearch Prefix Query: prefix-only term match.
- Elasticsearch Term Query: exact, non-analyzed match.
- Elasticsearch Simple Query String Query: exposes fuzziness through the
~operator.