NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Match Phrase Query: Exact Phrase Search - Syntax, Example, and Tips

The Elasticsearch match_phrase query searches for an analyzed phrase where the resulting tokens occur in the same order and adjacent to each other in the indexed document. It works only against text fields whose mapping preserves positions (the default for text). Use match_phrase when "data warehouse" must not match a document that contains only "data" and "warehouse" several paragraphs apart.

Syntax

{
  "query": {
    "match_phrase": {
      "<field>": {
        "query":    "<phrase>",
        "slop":     0,
        "analyzer": "<custom_analyzer>",
        "zero_terms_query": "none"
      }
    }
  }
}

Shorthand: { "match_phrase": { "field": "exact phrase" } }.

Parameters

Parameter Type Default Description
query string - Phrase to search. Analyzed before matching. Required.
slop int 0 Number of positional moves allowed between tokens.
analyzer string field's search_analyzer Override analyzer at query time.
zero_terms_query none | all none Behavior when analysis removes all tokens.

Examples

Strict adjacent phrase:

GET /articles/_search
{
  "query": { "match_phrase": { "content": "quick brown fox" } }
}

Allow one positional move (tokens can be one position apart or reordered by one):

GET /articles/_search
{
  "query": {
    "match_phrase": {
      "content": { "query": "quick fox", "slop": 1 }
    }
  }
}

With slop, "quick fox" would match a document containing "quick brown fox" (one intervening position) but not "quick lazy brown fox" (two).

Use a non-stemming analyzer at query time to find a literal phrase even if the field is stemmed:

GET /articles/_search
{
  "query": {
    "match_phrase": {
      "content": { "query": "running shoes", "analyzer": "standard" }
    }
  }
}

Combine with a filter in a bool:

GET /products/_search
{
  "query": {
    "bool": {
      "must":   [ { "match_phrase": { "description": "noise cancelling headphones" } } ],
      "filter": [ { "term": { "in_stock": true } } ]
    }
  }
}

Performance and Use Notes

match_phrase reads token positions from the postings list, which is more expensive than a plain match. Cost scales with phrase length and corpus size. For very-high-traffic phrase search on long documents, consider indexing with index_phrases: true on the text field - this builds a shingled index that the phrase query can use without scanning positions, at the cost of disk space.

Phrase queries rely on positional data; if the field is mapped with index_options: "docs" or index_options: "freqs", positions are absent and phrase queries silently fail to match. The default index_options for text is positions, which is what match_phrase needs.

A common production failure mode is stemming + phrase: searching for "stripes" on an english-analyzed field actually queries for the stem "stripe", and the slop calculus is over stems. Pulse surfaces mismatched analyzer behavior on your Elasticsearch cluster - including phrase queries that under-match because field-level stemming differs from user intent.

Common Mistakes

  1. Expecting match_phrase to handle wildcards - it does not; use match_phrase_prefix for prefix completion on the last token.
  2. Querying a field mapped with index_options: docs; no positions, no phrase match.
  3. Using match_phrase on a keyword field - keyword fields are single tokens and phrase semantics do not apply.
  4. Setting very high slop thinking it gives "fuzzy" matching; slop only relaxes position, not spelling.
  5. Forgetting that stopwords removed by the analyzer also disappear from the phrase, changing the effective query.

Frequently Asked Questions

Q: What is the difference between match and match_phrase in Elasticsearch?
A: The match query requires the analyzed tokens to be present but in any order; match_phrase additionally requires them to appear in the same order and adjacent (or within slop positions of adjacent).

Q: How does the slop parameter work in match_phrase?
A: slop is the maximum number of position swaps and gaps allowed to transform the indexed token stream into the query phrase. slop: 0 means tokens must be immediately adjacent in order; slop: 2 allows two positional moves total, which covers most "word-out-of-order" cases.

Q: Is the match_phrase query case-sensitive?
A: The query inherits case sensitivity from the field's analyzer. The default standard analyzer lowercases tokens at both index and query time, making match_phrase effectively case-insensitive. A keyword field with no lowercase normalizer would be case-sensitive.

Q: Can match_phrase be used with wildcards or fuzziness?
A: No. match_phrase accepts neither wildcards nor fuzziness. For prefix completion of the last word, use match_phrase_prefix; for typo tolerance, run the tokens through a separate fuzzy query or use match with fuzziness.

Q: Why does my match_phrase query return nothing even though the words are present?
A: Likely a stopword or stemming issue, or the field is mapped without positions. Run the input through _analyze against the field and inspect the resulting tokens and their positions - that is what match_phrase actually sees.

Q: How can I make a faster phrase query on a large corpus?
A: Enable index_phrases: true on the text field. Elasticsearch will index two-word shingles separately so phrase queries can run as plain term lookups without position scans.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.