Elasticsearch Fuzzy Query: Levenshtein Edit-Distance Matching - Syntax, Example, and Tips

The Elasticsearch fuzzy query matches documents containing terms within a configurable Levenshtein edit distance of the input. Edit distance counts character insertions, deletions, substitutions, and (when transpositions: true) adjacent-swaps. The query operates on a single token against keyword and text fields; it does not analyze the input. Use fuzzy when typo tolerance is needed on a specific field and the full match machinery is overkill.

Syntax

{
  "query": {
    "fuzzy": {
      "<field>": {
        "value":          "<term>",
        "fuzziness":      "AUTO",
        "max_expansions": 50,
        "prefix_length":  0,
        "transpositions": true,
        "rewrite":        "constant_score"
      }
    }
  }
}

Parameters

Parameter Type Default Description
value string - Term to match (single token). Required.
fuzziness AUTO | AUTO:lo,hi | 0 | 1 | 2 AUTO Maximum Levenshtein edit distance.
max_expansions int 50 Max number of distinct indexed terms enumerated.
prefix_length int 0 Leading characters that must match exactly.
transpositions bool true If true, an adjacent character swap counts as one edit.
rewrite string constant_score Multi-term rewrite strategy.

Fuzziness AUTO

AUTO chooses an edit distance based on term length:

Term length (chars) AUTO fuzziness
0-2 0 (exact match)
3-5 1
6+ 2

AUTO:3,6 overrides the boundaries (here: lengths 0-2 → 0 edits, 3-5 → 1, 6+ → 2). The default boundaries are 3 and 6.

Examples

Default AUTO fuzziness:

GET /products/_search
{
  "query": {
    "fuzzy": {
      "product_name": { "value": "labtop", "fuzziness": "AUTO" }
    }
  }
}

Bound the cost: 2-character exact prefix, max 10 expansions:

GET /articles/_search
{
  "query": {
    "fuzzy": {
      "title": {
        "value":          "elastcsearch",
        "fuzziness":      "AUTO",
        "prefix_length":  2,
        "max_expansions": 10
      }
    }
  }
}

Disable transpositions for stricter matching:

GET /names/_search
{
  "query": {
    "fuzzy": {
      "last_name": {
        "value":          "ahmad",
        "fuzziness":      1,
        "transpositions": false
      }
    }
  }
}

Combine with other clauses:

GET /products/_search
{
  "query": {
    "bool": {
      "must": [
        { "fuzzy": { "name":  { "value": "labtop", "fuzziness": "AUTO" } } },
        { "term":  { "category": "electronics" } }
      ]
    }
  }
}

Performance and Use Notes

The fuzzy query enumerates indexed terms within edit distance of the input using a Levenshtein automaton, then OR-combines them. Cost depends on the number of expansions enumerated. prefix_length is the most effective lever: requiring a 2-character exact prefix eliminates the vast majority of unrelated terms because the automaton can prune entire branches of the term dictionary. max_expansions caps the absolute number of variants considered (default 50).

fuzziness: 2 on short fields is dangerous - a 4-character term with two edits matches an enormous fraction of the term dictionary. For user-facing search, prefer fuzziness: AUTO with prefix_length: 1 or 2. On a field that needs typo tolerance regularly, consider an n-gram analyzer at index time instead of fuzzy queries at search time; the trade-off is index size for query speed.

Unbounded fuzzy queries are a recurring source of cluster CPU spikes. Manually scanning slow logs for high-expansion fuzzy patterns and deciding which to bound vs. move to index-time n-grams is exactly the optimization loop Pulse runs continuously.

Common Mistakes

  1. Using fuzziness: "AUTO" on terms ≤ 2 characters and being surprised that fuzzy matching does nothing - AUTO maps short terms to 0 edits.
  2. Leaving prefix_length: 0 on a public search endpoint; users typing a 1-character query enumerate massive term lists.
  3. Running fuzzy against an analyzed text field with stemming - the indexed terms are stems, so edit distance is calculated on stems, not on the words the user typed.
  4. Expecting fuzziness: 2 to find "color" from "colour" - that is a 1-edit insertion (or deletion), so fuzziness: 1 is sufficient and far cheaper.
  5. Pairing fuzzy with phrase semantics - fuzzy is single-term; use match query with fuzziness for multi-token input.

Find Slow Fuzzy Queries with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For fuzzy queries specifically, Pulse:

  • Identifies fuzzy queries running with prefix_length: 0 and high max_expansions against large keyword fields - the combination that lets Levenshtein automatons enumerate huge slices of the term dictionary
  • Flags fuzziness: 2 on short indexed terms where the candidate set explodes, and fuzziness: AUTO on fields with stems where edit distance is being measured against the wrong tokens
  • Traces each slow fuzzy query back to the calling service via slow-log and APM correlation
  • Recommends concrete fixes: raise prefix_length to 1 or 2, lower max_expansions to 10-25, or replace query-time fuzziness with an index-time n-gram or shingle analyzer
  • Tracks latency and CPU improvement after the change ships

This turns the manual slow-log plus DSL-debugging loop into a continuous optimization workflow.

Try Pulse on your cluster.

Frequently Asked Questions

Q: How does fuzziness AUTO work in an Elasticsearch fuzzy query?
A: AUTO chooses the max Levenshtein edit distance based on term length: 0 for terms ≤ 2 characters, 1 for 3-5 characters, 2 for 6+ characters. The boundaries are configurable via AUTO:lo,hi (default AUTO:3,6).

Q: What is the difference between fuzzy query and wildcard query in Elasticsearch?
A: The fuzzy query matches terms within an edit-distance threshold (typos, missing letters). The wildcard query matches a pattern with * and ? wildcards. Use fuzzy for spelling tolerance, wildcard for known partial patterns.

Q: How does prefix_length affect fuzzy query performance?
A: prefix_length requires the first N characters to match exactly. Each additional prefix character prunes the term dictionary traversal significantly. Going from prefix_length: 0 to prefix_length: 2 typically cuts query cost by an order of magnitude on a large index.

Q: Does the fuzzy query analyze its input?
A: No. The fuzzy query treats value as a single, unanalyzed term. For multi-word fuzzy matching, use the match query with the fuzziness parameter, which analyzes the input into tokens and applies fuzziness per token.

Q: Can the fuzzy query be used on numeric or date fields?
A: Technically yes, but the result is not meaningful - Levenshtein distance on numeric tokens is rarely what you want. Use the range query for numeric or date tolerance.

Q: How do I bound the cost of fuzzy queries in production?
A: Set prefix_length to at least 1 (preferably 2), cap max_expansions (10-25 is typical), and avoid fuzziness: 2 on short indexed terms. For aggressive typo tolerance, index n-grams or shingles instead of relying on query-time fuzziness.

Q: What is the best tool to find slow fuzzy queries in production Elasticsearch?
A: Pulse profiles Elasticsearch and OpenSearch slow logs, identifies fuzzy queries with prefix_length: 0 and high max_expansions, attributes each to the calling service, and recommends a bounded rewrite or an index-time n-gram alternative.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.