The Elasticsearch query_string query parses a single input string using Lucene's query syntax and runs the resulting expression against one or more fields. It supports boolean operators (AND, OR, NOT), field selectors (title:foo), grouping with parentheses, wildcards, regular expressions, fuzziness, and proximity. Use it when callers (typically power users or admin UIs) need the full Lucene mini-language. For untrusted input prefer simple_query_string, which silently ignores syntax errors instead of throwing.
Syntax
GET /_search
{
"query": {
"query_string": {
"query": "title:elasticsearch AND status:published",
"default_field": "content",
"default_operator": "OR"
}
}
}
Parameters
| Parameter | Description | Required | Default |
|---|---|---|---|
query |
The query string to parse. | Yes | - |
default_field |
Field used when the query has no explicit field selector. Supports wildcards. | No | * (all eligible fields, capped by index.query.default_field, default *) |
fields |
Array of fields to search; supports wildcards and field^boost. |
No | - |
default_operator |
Operator between terms with no explicit AND/OR. |
No | OR |
analyzer |
Override analyzer for the query string. | No | Index search analyzer |
analyze_wildcard |
If true, analyzes wildcard terms. Only the * suffix is reliably analyzed; stemmers and similar token filters still skip wildcard tokens. |
No | false |
allow_leading_wildcard |
Permits * or ? as the first character of a term. |
No | true |
fuzziness |
Maximum edit distance for fuzzy terms (e.g. AUTO, 1, 2). |
No | - |
fuzzy_max_expansions |
Max terms a fuzzy expansion can produce. | No | 50 |
fuzzy_prefix_length |
Characters at the start of the term left unchanged. | No | 0 |
lenient |
Ignore format errors (e.g. text typed against a numeric field). | No | false |
minimum_should_match |
Minimum optional clauses that must match. | No | - |
phrase_slop |
Positions allowed between tokens in a phrase. | No | 0 |
time_zone |
UTC offset or IANA zone for date conversion. | No | - |
boost |
Score multiplier for the whole query. | No | 1.0 |
max_determinized_states |
Limit for regexp automaton states. | No | 10000 |
Examples
Basic search across the default field:
GET /articles/_search
{
"query": {
"query_string": {
"query": "elasticsearch",
"default_field": "content"
}
}
}
Multi-field search with per-field boosts and an explicit operator:
GET /articles/_search
{
"query": {
"query_string": {
"query": "elasticsearch performance",
"fields": ["title^3", "summary^2", "content"],
"default_operator": "AND"
}
}
}
Mixed Lucene syntax with field selectors, grouping, and fuzziness (~):
GET /articles/_search
{
"query": {
"query_string": {
"query": "(title:\"distributed search\" OR tags:search) AND author:martin~1",
"lenient": true
}
}
}
Inline regexp using /.../ delimiters:
GET /users/_search
{
"query": {
"query_string": {
"query": "username:/jdoe[0-9]+/"
}
}
}
Performance and Use Notes
Leading wildcards are the most common performance trap. *term forces a full terms-dictionary scan per shard; disable them in production with "allow_leading_wildcard": false unless you have rewritten the field with a reverse analyzer or an n-gram index. Regexps and unbounded fuzziness are similarly expensive and should run only on keyword fields with bounded cardinality.
query_string throws a parse error on malformed input (unbalanced quotes, dangling operators, invalid field names without lenient). If end users type the strings, use simple_query_string instead, which ignores invalid tokens and never raises. Always escape the reserved characters + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ / (double-backslash in JSON) when they appear as literals.
Complex query_string queries are a frequent source of slow searches and cluster instability. The manual triage - reading slow logs, parsing each Lucene expression to find the leading wildcard or unbounded fuzziness inside it, then chasing down the originating service - is precisely what Pulse runs continuously.
Common Mistakes
- Leaving
allow_leading_wildcardat its default oftrueon a user-facing endpoint, then watching*fooqueries melt a shard. - Setting
analyze_wildcard: trueand expecting stemming to fire - it does not; tokens containing wildcards are still treated as raw text by most filters. - Forgetting to escape
:or/in product IDs and URLs embedded in the query string. - Using
query_stringfor end-user input and surfacing parse exceptions to the UI instead of switching tosimple_query_string. - Targeting
textfields with regexp or wildcard expressions when akeywordsub-field exists.
Find Slow query_string Queries with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For query_string queries specifically, Pulse:
- Parses live
query_stringexpressions and flags those containing leading wildcards, embedded regexps with leading.*, unbounded~fuzziness, orfield:*field-existence patterns - Detects clusters where
allow_leading_wildcardis enabled and a user-facing endpoint is forwarding unsafe Lucene syntax - Identifies parse-error spikes that indicate the wrong endpoint is using
query_stringinstead ofsimple_query_string - Traces each slow
query_stringback to the calling service via slow-log and APM correlation - Recommends concrete rewrites: disable
allow_leading_wildcard, switch tosimple_query_stringfor untrusted input, split structured predicates into a bool query withfilterclauses, or scopefieldsto indexed text-only fields - Tracks the latency improvement after the rewrite ships
This converts the manual Lucene-syntax debugging loop into a continuous optimization workflow.
Frequently Asked Questions
Q: What is the difference between query_string and simple_query_string?
A: Both parse a Lucene-like mini-language, but simple_query_string uses a relaxed grammar that silently drops invalid tokens, while query_string throws a parse exception. Use simple_query_string for untrusted input and query_string for trusted, admin-style search.
Q: When should I use query_string over multi_match?
A: Use query_string when the caller needs Lucene operators (field selectors, boolean grouping, regex, fuzziness in one expression). Use multi_match when the input is plain text and you want predictable scoring across known fields.
Q: How do I make query_string case-insensitive?
A: query_string applies the field's search analyzer, so casing is controlled by the analyzer (e.g. standard lowercases by default). For wildcard or regexp terms, add case_insensitive: true inside the query or use a lowercase normalizer on the keyword field.
Q: Are leading wildcards really that slow?
A: Yes. A leading * or ? prevents Lucene from using the term dictionary's sorted prefix lookup and forces a full scan of every term in the field, per shard. Disable allow_leading_wildcard or rewrite the data with reverse/n-gram analysis.
Q: Why does my date or number query fail with a format exception?
A: query_string validates input against each field's mapped type. Set lenient: true to skip incompatible fields, or scope fields to text-only fields. The cleaner fix is to route structured predicates through a range query or term query.
Q: Can I use boost inside the query string itself?
A: Yes - append ^N to a term or grouped expression, e.g. title:elasticsearch^3 OR summary:elasticsearch. The top-level boost parameter multiplies the score of the entire query.
Q: How do I find which query_string patterns are causing slow searches?
A: Pulse ingests Elasticsearch and OpenSearch slow logs, parses each query_string expression to spot leading wildcards, embedded regexps, and unbounded fuzziness, correlates each slow query to the calling service, and recommends safer rewrites (disable allow_leading_wildcard, switch to simple_query_string, or move structured predicates into a bool query filter).
Related Reading
- Elasticsearch Query Language: overview of the query DSL.
- Multi-Match Query: structured multi-field search without Lucene syntax.
- Bool Query: composing multiple clauses with explicit boolean logic.
- Wildcard Query: targeted pattern matching on a single field.
- Regexp Query: regex matching against indexed terms.