The Elasticsearch knn query, introduced in 8.4, performs approximate k-nearest-neighbor search against a dense_vector field using the HNSW graph index. Unlike the older top-level knn element in the search body, the knn query is a regular DSL clause that can be combined with bool, used inside function_score, or run as part of a hybrid lexical+vector search. It works only on fields mapped as dense_vector with index: true.
Syntax
{
"query": {
"knn": {
"field": "vector_field",
"query_vector": [0.3, 0.1, 0.2, /* ... */ ],
"k": 10,
"num_candidates": 100,
"similarity": 0.5,
"filter": { "term": { "tag": "shoes" } },
"boost": 1.0
}
}
}
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
field |
string | - | Name of the dense_vector field. Required. |
query_vector |
array of floats | - | Query vector. Required unless query_vector_builder is used. |
query_vector_builder |
object | - | Inference endpoint that produces the vector from text. |
k |
int | size of search (≤10000) | Number of nearest neighbors to return per shard. |
num_candidates |
int | 1.5 * k, max 10000 |
Candidate pool per shard examined by HNSW. Must be ≥ k. |
similarity |
float | - | Minimum similarity score for results to be returned. |
filter |
query | - | Filter applied during HNSW traversal (pre-filter). |
boost |
float | 1.0 |
Multiplier on the similarity score. |
HNSW search visits up to num_candidates candidates per shard and returns the top k. Recall improves with larger num_candidates; latency rises roughly linearly. Defaults to Math.min(1.5 * k, 10000) when not set.
Examples
Plain top-5 vector search:
POST /products/_search
{
"query": {
"knn": {
"field": "product_features",
"query_vector": [0.5, 0.5, 0.5, 0.5],
"k": 5,
"num_candidates": 100
}
}
}
Hybrid lexical + vector with bool.should:
POST /docs/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "kubernetes" } },
{ "knn": {
"field": "title_embedding",
"query_vector": [/* 768 floats */],
"k": 50,
"num_candidates": 200
}}
]
}
}
}
Pre-filtered kNN (filter is applied during graph traversal, not after):
POST /products/_search
{
"query": {
"knn": {
"field": "embedding",
"query_vector": [/* ... */],
"k": 20,
"num_candidates": 200,
"filter": { "term": { "in_stock": true } }
}
}
}
Use an inference endpoint to build the vector from text at query time:
POST /docs/_search
{
"query": {
"knn": {
"field": "embedding",
"query_vector_builder": {
"text_embedding": {
"model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
"model_text": "how do graph indexes work"
}
},
"k": 10,
"num_candidates": 100
}
}
}
Performance and Use Notes
knn query vs the top-level knn search element: the older top-level knn (still supported) is optimized for pure vector search and supports per-shard k/num_candidates tuning at the request top level. The knn query is composable - it participates in scoring with other clauses inside bool and can be nested inside function_score. Use the top-level form for pure ANN, the query form for hybrid retrieval.
num_candidates is the single biggest performance knob. Set it as low as recall tolerates. The HNSW index is held in OS page cache; cold-cache queries can be 10-100x slower than warm. Quantization (index_options.type: int8_hnsw or int4_hnsw) typically cuts memory 4x or 8x with small recall loss. Filter inside the knn clause rather than wrapping it with bool.filter: the inner filter is pushed into HNSW traversal, which preserves recall on selective filters.
Vector workloads change cluster memory and CPU profiles in subtle ways. Manually tracking dense_vector field sizes, HNSW graph cache hit ratios, and per-shard num_candidates behavior to find why recall is dropping or latency is rising is the loop Pulse runs continuously.
Common Mistakes
- Setting
num_candidatesequal tok- HNSW degenerates and recall collapses. - Filtering with a
bool.filterwrapper instead of thefilterparameter inside theknnquery; post-filtering can return fewer thankhits. - Querying a
dense_vectorfield that was indexed withindex: false; the request fails or falls back to exact search. - Mismatched similarity: indexed with
cosine, queried with vectors that aren't L2-normalized when usingdot_product. - Expecting exact nearest neighbors; HNSW is approximate by design.
Find Slow kNN Queries with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For knn queries specifically, Pulse:
- Identifies kNN queries with
num_candidatesset too close tok, where HNSW degenerates and recall collapses, plus settings silently capped at the 10000 ceiling - Flags post-filtered kNN (a
bool.filterwrapped around aknnclause) returning fewer thankhits because the filter is applied after retrieval instead of pushed into HNSW traversal - Spots
dense_vectorfields indexed withindex: false, mismatched similarity (cosine field queried with unnormalizeddot_productvectors), and graph cache thrashing from cold OS page cache after merges - Traces each slow kNN back to the calling service via slow-log and APM correlation
- Recommends concrete fixes: raise
num_candidatesto the recall plateau, move the filter inside theknnclause for pre-filtered traversal, switch toint8_hnsworint4_hnswquantization to cut memory 4x or 8x, raiseef_constructionandmfor higher recall, or hand vector retrieval toknnwhile keeping script_score for rescoring only - Tracks latency and recall after the change ships
This converts the manual HNSW-tuning loop into a continuous optimization workflow.
Frequently Asked Questions
Q: What is the difference between the knn query and the top-level knn search element?
A: The top-level knn element (pre-8.4 and still supported) is the original pure-ANN search form. The knn query (8.4+) is a composable DSL clause that participates in bool, function_score, and hybrid scoring. Both use the same HNSW index.
Q: What are the default values of k and num_candidates in an Elasticsearch knn query?
A: If k is omitted, it defaults to the search size (capped at 10000). If num_candidates is omitted, it defaults to Math.min(1.5 * k, 10000). num_candidates must be ≥ k and ≤ 10000.
Q: How does the filter parameter in a knn query work?
A: The filter is applied during HNSW graph traversal, so the engine only considers candidates that satisfy the filter while walking the graph. This preserves the k returned hits even on selective filters; wrapping a knn query in bool.filter instead applies the filter after retrieval and can yield fewer than k results.
Q: Which similarity metrics does Elasticsearch knn query support?
A: The metric is set on the dense_vector field mapping: l2_norm, dot_product, cosine, and max_inner_product (8.11+). The knn query uses whatever is configured on the field.
Q: How do I improve recall in an Elasticsearch knn query?
A: Raise num_candidates, raise ef_construction and m in the index mapping, avoid aggressive quantization, and avoid post-filtering. Recall is roughly monotonic in num_candidates until it plateaus near exact search.
Q: Can knn query be combined with full-text scoring?
A: Yes - place a match/multi_match and a knn inside bool.should. Each contributes its score and Elasticsearch sums them (or use RRF via the retriever API in 8.14+ for reciprocal rank fusion).
Q: How do I find slow kNN queries and low-recall vector retrieval in production?
A: Pulse profiles Elasticsearch and OpenSearch slow logs and HNSW graph cache metrics, isolates kNN queries with degenerate num_candidates, post-filter recall loss, or unquantized vectors causing memory pressure, attributes each to the calling service, and recommends quantization, num_candidates raises, or pushing filters into the knn.filter parameter.
Related Reading
- Elasticsearch Bool Query: wrapper for hybrid lexical+vector search.
- Elasticsearch Function Score Query: rescoring a kNN result set.
- Elasticsearch Match Query: the lexical half of a hybrid retrieval.
- Elasticsearch Script Score Query: custom scoring over vectors.
- Elasticsearch Term Query: selective pre-filter to feed inside
knn.filter.