The Elasticsearch match_phrase query searches for an analyzed phrase where the resulting tokens occur in the same order and adjacent to each other in the indexed document. It works only against text fields whose mapping preserves positions (the default for text). Use match_phrase when "data warehouse" must not match a document that contains only "data" and "warehouse" several paragraphs apart.
Syntax
{
"query": {
"match_phrase": {
"<field>": {
"query": "<phrase>",
"slop": 0,
"analyzer": "<custom_analyzer>",
"zero_terms_query": "none"
}
}
}
}
Shorthand: { "match_phrase": { "field": "exact phrase" } }.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | - | Phrase to search. Analyzed before matching. Required. |
slop |
int | 0 | Number of positional moves allowed between tokens. |
analyzer |
string | field's search_analyzer | Override analyzer at query time. |
zero_terms_query |
none | all |
none |
Behavior when analysis removes all tokens. |
Examples
Strict adjacent phrase:
GET /articles/_search
{
"query": { "match_phrase": { "content": "quick brown fox" } }
}
Allow one positional move (tokens can be one position apart or reordered by one):
GET /articles/_search
{
"query": {
"match_phrase": {
"content": { "query": "quick fox", "slop": 1 }
}
}
}
With slop, "quick fox" would match a document containing "quick brown fox" (one intervening position) but not "quick lazy brown fox" (two).
Use a non-stemming analyzer at query time to find a literal phrase even if the field is stemmed:
GET /articles/_search
{
"query": {
"match_phrase": {
"content": { "query": "running shoes", "analyzer": "standard" }
}
}
}
Combine with a filter in a bool:
GET /products/_search
{
"query": {
"bool": {
"must": [ { "match_phrase": { "description": "noise cancelling headphones" } } ],
"filter": [ { "term": { "in_stock": true } } ]
}
}
}
Performance and Use Notes
match_phrase reads token positions from the postings list, which is more expensive than a plain match. Cost scales with phrase length and corpus size. For very-high-traffic phrase search on long documents, consider indexing with index_phrases: true on the text field - this builds a shingled index that the phrase query can use without scanning positions, at the cost of disk space.
Phrase queries rely on positional data; if the field is mapped with index_options: "docs" or index_options: "freqs", positions are absent and phrase queries silently fail to match. The default index_options for text is positions, which is what match_phrase needs.
A common production failure mode is stemming + phrase: searching for "stripes" on an english-analyzed field actually queries for the stem "stripe", and the slop calculus is over stems. Pulse surfaces mismatched analyzer behavior on your Elasticsearch cluster - including phrase queries that under-match because field-level stemming differs from user intent.
Common Mistakes
- Expecting
match_phraseto handle wildcards - it does not; use match_phrase_prefix for prefix completion on the last token. - Querying a field mapped with
index_options: docs; no positions, no phrase match. - Using
match_phraseon akeywordfield - keyword fields are single tokens and phrase semantics do not apply. - Setting very high
slopthinking it gives "fuzzy" matching; slop only relaxes position, not spelling. - Forgetting that stopwords removed by the analyzer also disappear from the phrase, changing the effective query.
Frequently Asked Questions
Q: What is the difference between match and match_phrase in Elasticsearch?
A: The match query requires the analyzed tokens to be present but in any order; match_phrase additionally requires them to appear in the same order and adjacent (or within slop positions of adjacent).
Q: How does the slop parameter work in match_phrase?
A: slop is the maximum number of position swaps and gaps allowed to transform the indexed token stream into the query phrase. slop: 0 means tokens must be immediately adjacent in order; slop: 2 allows two positional moves total, which covers most "word-out-of-order" cases.
Q: Is the match_phrase query case-sensitive?
A: The query inherits case sensitivity from the field's analyzer. The default standard analyzer lowercases tokens at both index and query time, making match_phrase effectively case-insensitive. A keyword field with no lowercase normalizer would be case-sensitive.
Q: Can match_phrase be used with wildcards or fuzziness?
A: No. match_phrase accepts neither wildcards nor fuzziness. For prefix completion of the last word, use match_phrase_prefix; for typo tolerance, run the tokens through a separate fuzzy query or use match with fuzziness.
Q: Why does my match_phrase query return nothing even though the words are present?
A: Likely a stopword or stemming issue, or the field is mapped without positions. Run the input through _analyze against the field and inspect the resulting tokens and their positions - that is what match_phrase actually sees.
Q: How can I make a faster phrase query on a large corpus?
A: Enable index_phrases: true on the text field. Elasticsearch will index two-word shingles separately so phrase queries can run as plain term lookups without position scans.
Related Reading
- Elasticsearch Match Query: the non-phrase counterpart.
- Elasticsearch Simple Query String Query: user-facing parser that supports quoted phrases.
- Elasticsearch Term Query: exact-value query on non-analyzed fields.
- Elasticsearch Fuzzy Query: typo tolerance on a single term.
- Elasticsearch Bool Query: combine phrase clauses with filters.