NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch kNN Query: Vector Similarity Search in DSL - Syntax, Example, and Tips

The Elasticsearch knn query, introduced in 8.4, performs approximate k-nearest-neighbor search against a dense_vector field using the HNSW graph index. Unlike the older top-level knn element in the search body, the knn query is a regular DSL clause that can be combined with bool, used inside function_score, or run as part of a hybrid lexical+vector search. It works only on fields mapped as dense_vector with index: true.

Syntax

{
  "query": {
    "knn": {
      "field":          "vector_field",
      "query_vector":   [0.3, 0.1, 0.2, /* ... */ ],
      "k":              10,
      "num_candidates": 100,
      "similarity":     0.5,
      "filter":         { "term": { "tag": "shoes" } },
      "boost":          1.0
    }
  }
}

Parameters

Parameter Type Default Description
field string - Name of the dense_vector field. Required.
query_vector array of floats - Query vector. Required unless query_vector_builder is used.
query_vector_builder object - Inference endpoint that produces the vector from text.
k int size of search (≤10000) Number of nearest neighbors to return per shard.
num_candidates int 1.5 * k, max 10000 Candidate pool per shard examined by HNSW. Must be ≥ k.
similarity float - Minimum similarity score for results to be returned.
filter query - Filter applied during HNSW traversal (pre-filter).
boost float 1.0 Multiplier on the similarity score.

HNSW search visits up to num_candidates candidates per shard and returns the top k. Recall improves with larger num_candidates; latency rises roughly linearly. Defaults to Math.min(1.5 * k, 10000) when not set.

Examples

Plain top-5 vector search:

POST /products/_search
{
  "query": {
    "knn": {
      "field": "product_features",
      "query_vector": [0.5, 0.5, 0.5, 0.5],
      "k": 5,
      "num_candidates": 100
    }
  }
}

Hybrid lexical + vector with bool.should:

POST /docs/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title": "kubernetes" } },
        { "knn": {
            "field": "title_embedding",
            "query_vector": [/* 768 floats */],
            "k": 50,
            "num_candidates": 200
        }}
      ]
    }
  }
}

Pre-filtered kNN (filter is applied during graph traversal, not after):

POST /products/_search
{
  "query": {
    "knn": {
      "field": "embedding",
      "query_vector": [/* ... */],
      "k": 20,
      "num_candidates": 200,
      "filter": { "term": { "in_stock": true } }
    }
  }
}

Use an inference endpoint to build the vector from text at query time:

POST /docs/_search
{
  "query": {
    "knn": {
      "field": "embedding",
      "query_vector_builder": {
        "text_embedding": {
          "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
          "model_text": "how do graph indexes work"
        }
      },
      "k": 10,
      "num_candidates": 100
    }
  }
}

Performance and Use Notes

knn query vs the top-level knn search element: the older top-level knn (still supported) is optimized for pure vector search and supports per-shard k/num_candidates tuning at the request top level. The knn query is composable - it participates in scoring with other clauses inside bool and can be nested inside function_score. Use the top-level form for pure ANN, the query form for hybrid retrieval.

num_candidates is the single biggest performance knob. Set it as low as recall tolerates. The HNSW index is held in OS page cache; cold-cache queries can be 10-100x slower than warm. Quantization (index_options.type: int8_hnsw or int4_hnsw) typically cuts memory 4x or 8x with small recall loss. Filter inside the knn clause rather than wrapping it with bool.filter: the inner filter is pushed into HNSW traversal, which preserves recall on selective filters.

Vector workloads change cluster memory and CPU profiles in subtle ways. Manually tracking dense_vector field sizes, HNSW graph cache hit ratios, and per-shard num_candidates behavior to find why recall is dropping or latency is rising is the loop Pulse runs continuously.

Common Mistakes

  1. Setting num_candidates equal to k - HNSW degenerates and recall collapses.
  2. Filtering with a bool.filter wrapper instead of the filter parameter inside the knn query; post-filtering can return fewer than k hits.
  3. Querying a dense_vector field that was indexed with index: false; the request fails or falls back to exact search.
  4. Mismatched similarity: indexed with cosine, queried with vectors that aren't L2-normalized when using dot_product.
  5. Expecting exact nearest neighbors; HNSW is approximate by design.

Find Slow kNN Queries with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch that continuously profiles production query traffic. For knn queries specifically, Pulse:

  • Identifies kNN queries with num_candidates set too close to k, where HNSW degenerates and recall collapses, plus settings silently capped at the 10000 ceiling
  • Flags post-filtered kNN (a bool.filter wrapped around a knn clause) returning fewer than k hits because the filter is applied after retrieval instead of pushed into HNSW traversal
  • Spots dense_vector fields indexed with index: false, mismatched similarity (cosine field queried with unnormalized dot_product vectors), and graph cache thrashing from cold OS page cache after merges
  • Traces each slow kNN back to the calling service via slow-log and APM correlation
  • Recommends concrete fixes: raise num_candidates to the recall plateau, move the filter inside the knn clause for pre-filtered traversal, switch to int8_hnsw or int4_hnsw quantization to cut memory 4x or 8x, raise ef_construction and m for higher recall, or hand vector retrieval to knn while keeping script_score for rescoring only
  • Tracks latency and recall after the change ships

This converts the manual HNSW-tuning loop into a continuous optimization workflow.

Try Pulse on your cluster.

Frequently Asked Questions

Q: What is the difference between the knn query and the top-level knn search element?
A: The top-level knn element (pre-8.4 and still supported) is the original pure-ANN search form. The knn query (8.4+) is a composable DSL clause that participates in bool, function_score, and hybrid scoring. Both use the same HNSW index.

Q: What are the default values of k and num_candidates in an Elasticsearch knn query?
A: If k is omitted, it defaults to the search size (capped at 10000). If num_candidates is omitted, it defaults to Math.min(1.5 * k, 10000). num_candidates must be ≥ k and ≤ 10000.

Q: How does the filter parameter in a knn query work?
A: The filter is applied during HNSW graph traversal, so the engine only considers candidates that satisfy the filter while walking the graph. This preserves the k returned hits even on selective filters; wrapping a knn query in bool.filter instead applies the filter after retrieval and can yield fewer than k results.

Q: Which similarity metrics does Elasticsearch knn query support?
A: The metric is set on the dense_vector field mapping: l2_norm, dot_product, cosine, and max_inner_product (8.11+). The knn query uses whatever is configured on the field.

Q: How do I improve recall in an Elasticsearch knn query?
A: Raise num_candidates, raise ef_construction and m in the index mapping, avoid aggressive quantization, and avoid post-filtering. Recall is roughly monotonic in num_candidates until it plateaus near exact search.

Q: Can knn query be combined with full-text scoring?
A: Yes - place a match/multi_match and a knn inside bool.should. Each contributes its score and Elasticsearch sums them (or use RRF via the retriever API in 8.14+ for reciprocal rank fusion).

Q: How do I find slow kNN queries and low-recall vector retrieval in production?
A: Pulse profiles Elasticsearch and OpenSearch slow logs and HNSW graph cache metrics, isolates kNN queries with degenerate num_candidates, post-filter recall loss, or unquantized vectors causing memory pressure, attributes each to the calling service, and recommends quantization, num_candidates raises, or pushing filters into the knn.filter parameter.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.