Elasticsearch search_as_you_type Field Type: Autocomplete With Edge n-grams and Shingles

The search_as_you_type field type maps a text value into one analyzed root field plus several auto-generated shingle and edge n-gram subfields, optimized for fast in-order prefix matching. It is the standard mapping for typeahead search boxes where each keystroke should retrieve documents that begin with - or contain a phrase that begins with - the partial query. Compared to the completion suggester, search_as_you_type returns full documents (not isolated suggestions) and supports BM25 scoring, filters, and highlighting.

How search_as_you_type Works

Declaring a field as search_as_you_type creates four indexed forms behind the scenes:

  • The root field, analyzed normally for full-text matching.
  • <field>._2gram, a shingle field producing 2-word shingles.
  • <field>._3gram, a shingle field producing 3-word shingles.
  • <field>._index_prefix, an edge n-gram field that indexes prefixes of the shingles for fast leading-token matches.

max_shingle_size (default 3, range 2-4) controls how many shingle subfields are generated. With max_shingle_size: 4, you also get ._4gram. The ._index_prefix field always indexes prefixes of the largest shingle size, which is what makes the trailing partial token match cheaply.

PUT products
{
  "mappings": {
    "properties": {
      "name": {
        "type": "search_as_you_type",
        "max_shingle_size": 3
      }
    }
  }
}

GET products/_search
{
  "query": {
    "multi_match": {
      "query": "blue runn",
      "type": "bool_prefix",
      "fields": [
        "name",
        "name._2gram",
        "name._3gram"
      ]
    }
  }
}

The multi_match with type: bool_prefix evaluates leading terms as exact matches and the final term as a prefix against ._index_prefix. This is the only query type the field is tuned for; using plain match works but skips the prefix index and degrades to standard scoring.

search_as_you_type Configuration

Parameter Default Notes
max_shingle_size 3 Number of shingle subfields (2-4). Higher catches longer phrase prefixes at the cost of index size.
analyzer standard Analyzer applied to the root and shingle subfields.
search_analyzer inherits analyzer Override the query-time analyzer.
index true Disable to make the field stored-only (rarely useful for autocomplete).
store false Store the original text separately from _source.
similarity BM25 Scoring function.
norms true Field-length norms; disable if you don't need length normalization.
term_vector none Set to with_positions_offsets to enable fast highlighting on the root field.

The subfields are not separately configurable - you cannot map different analyzers to ._2gram versus the root. If you need that level of control, model the field manually with text + a custom edge n-gram analyzer.

Common Pitfalls with search_as_you_type

  1. Using match instead of multi_match with type: bool_prefix. The shingle and prefix subfields don't get hit, defeating the purpose.
  2. Applying it to long-form text fields (descriptions, articles). Shingle and prefix subfields blow up index size 3-4x. Restrict it to short fields like titles, product names, and addresses.
  3. Setting max_shingle_size: 4 without a real query pattern that benefits from 4-word phrase prefixes. The extra subfield is pure overhead.
  4. Forgetting that the prefix subfield uses min_gram=1 and max_gram=20-ish under the hood. Single-character queries match the world; consider a minimum-input-length on the client side.
  5. Expecting suggestions in the response shape of the completion suggester. search_as_you_type returns documents, not isolated terms - you have to extract the displayed suggestion yourself, usually with highlighting.

Operating search_as_you_type in Production

Autocomplete latency is one of the most visible signals to end users; p99 above ~150 ms is usually noticed. The shingle and prefix subfields enlarge segments and slow merges, so refresh interval and merge throttling settings matter more on these indices than on plain text indices. Pulse tracks autocomplete query latency percentiles separately from regular search traffic and flags refresh and merge configuration that is causing tail-latency spikes on typeahead workloads.

Frequently Asked Questions

Q: How is search_as_you_type different from the completion suggester?
A: completion is a separate suggester data structure (FST) optimized for very fast prefix lookups over short suggestions; it doesn't support filters, full BM25 scoring, or returning whole documents. search_as_you_type is a regular text field with auto-generated shingle and prefix subfields, returning documents with scoring and filtering - slower than completion, but far more flexible.

Q: What query should I use against a search_as_you_type field?
A: multi_match with type: bool_prefix, targeting the root field plus all ._Ngram subfields. This matches leading tokens exactly and treats the trailing token as a prefix against the ._index_prefix subfield, which is what the field type is engineered for.

Q: How much does search_as_you_type increase index size?
A: Roughly 3-4x the size of the same data mapped as plain text, depending on max_shingle_size. The shingle subfields plus the edge n-gram prefix index account for most of the overhead.

Q: Does search_as_you_type work for non-Latin scripts like Chinese or Japanese?
A: It works mechanically, but the default standard analyzer doesn't tokenize CJK scripts well. Set a CJK-aware analyzer (icu_analyzer with the appropriate tokenizer, or a third-party plugin like kuromoji for Japanese) on the field before relying on it for autocomplete.

Q: Can I configure the analyzer separately for the shingle subfields?
A: No. The shingle and prefix subfields inherit the root field's analyzer. If you need a different analysis chain for each, build the multi-field manually with text + a custom edge n-gram analyzer.

Q: Does search_as_you_type support highlighting?
A: Yes. Enable term_vector: with_positions_offsets on the root field to use the fast vector highlighter, which is the right choice for typeahead-style highlighting.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.