Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Read more

Semantic Search vs Vector Search: Understanding the Difference

The terms "semantic search" and "vector search" are often used interchangeably, but they refer to different layers of the same problem. Understanding the distinction helps you make better architectural decisions when building search systems.

Definitions

Vector Search

Vector search is a retrieval mechanism. It finds the nearest neighbors to a query vector in a high-dimensional vector space using distance metrics (cosine similarity, dot product, L2/Euclidean distance).

At its core, vector search is a mathematical operation: given a query vector and a set of stored vectors, return the top-K most similar vectors.

Vector search doesn't know or care what the vectors represent. They could be text embeddings, image features, audio fingerprints, or user behavior profiles.

Semantic Search

Semantic search is an application-level concept. It means retrieving results based on the meaning of a query rather than exact keyword matching. A semantic search for "how to fix a leaky faucet" should return results about "plumbing repair" and "dripping tap" even if those exact words don't appear in the query.

Semantic search is typically implemented using vector search (with text embeddings), but it's not the only way. It can also involve:

  • Query expansion (adding synonyms and related terms)
  • Knowledge graphs (understanding entity relationships)
  • Learned sparse representations (models like SPLADE that produce weighted term vectors)

The Relationship

Semantic Search  →  uses  →  Vector Search  →  operates on  →  Embeddings
(application goal)           (retrieval mechanism)               (data representation)

Vector search is a tool. Semantic search is what you build with it (for text). You can use vector search for non-semantic purposes (image similarity, recommendation systems, anomaly detection), and you can achieve some degree of semantic search without vectors (synonym expansion, stemming).

How Vector Search Works

Embedding Generation

Text is converted to dense vectors using embedding models:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Each text becomes a 384-dimensional vector
query_vector = model.encode("how to fix a leaky faucet")
doc_vector = model.encode("plumbing repair guide for dripping taps")

The embedding model positions semantically similar texts close together in vector space. "Fix a leaky faucet" and "repair a dripping tap" produce vectors that are near each other, even though they share almost no words.

Approximate Nearest Neighbor (ANN) Algorithms

Exact nearest-neighbor search over millions of vectors is too slow. Production systems use ANN algorithms that trade small accuracy loss for dramatic speed improvements:

  • HNSW (Hierarchical Navigable Small World): Graph-based. The most popular choice for accuracy and speed balance. Used by OpenSearch, Elasticsearch, pgvector.
  • IVF (Inverted File Index): Partition-based. Clusters vectors, then searches only relevant clusters. Lower memory than HNSW.
  • Product Quantization (PQ): Compresses vectors for lower memory footprint at the cost of accuracy.

Vector Search in OpenSearch

# Create index with knn_vector field
PUT /documents
{
  "settings": { "index.knn": true },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib"
        }
      }
    }
  }
}

# Query
POST /documents/_search
{
  "query": {
    "knn": {
      "embedding": {
        "vector": [0.1, 0.2, ...],
        "k": 10
      }
    }
  }
}

Vector Search in Elasticsearch

# Create index with dense_vector field
PUT /documents
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "embedding": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}

# Query
POST /documents/_search
{
  "knn": {
    "field": "embedding",
    "query_vector": [0.1, 0.2, ...],
    "k": 10,
    "num_candidates": 100
  }
}

Traditional keyword (lexical) search and semantic search have complementary strengths:

Aspect Lexical Search (BM25) Semantic Search (Vectors)
How it works Term frequency and inverse document frequency Vector similarity in embedding space
Handles synonyms Only with explicit synonym configuration Naturally — embeddings capture meaning
Exact match Excellent — finds precise terms Poor — may miss exact identifiers, SKUs, codes
Typo tolerance Requires fuzzy matching configuration Moderate — small typos often produce similar embeddings
Out-of-vocabulary terms Fails on terms not in the index Handles novel phrasings if meaning is similar
Domain-specific jargon Works well if jargon appears in documents May fail if embedding model wasn't trained on domain
Scoring transparency BM25 scores are interpretable Embedding similarity scores are opaque
Performance Very fast (inverted index) Slower (ANN search), more memory
Index size Compact inverted index Large vector storage (dims × 4 bytes × docs)

Neither approach is universally better. Most production search systems benefit from combining both — this is called hybrid search.

Hybrid Search: Combining Both

Hybrid search runs lexical and vector search in parallel, then fuses the results:

# OpenSearch hybrid search with neural search plugin
POST /documents/_search
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "title": "leaky faucet repair"
          }
        },
        {
          "knn": {
            "embedding": {
              "vector": [0.1, 0.2, ...],
              "k": 10
            }
          }
        }
      ]
    }
  }
}

In Elasticsearch, use Reciprocal Rank Fusion (RRF) to combine results:

POST /documents/_search
{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": { "match": { "title": "leaky faucet repair" } }
          }
        },
        {
          "knn": {
            "field": "embedding",
            "query_vector": [0.1, 0.2, ...],
            "k": 10,
            "num_candidates": 100
          }
        }
      ]
    }
  }
}

Score Fusion Strategies

  • Reciprocal Rank Fusion (RRF): Combines result rankings rather than scores. Simple, robust, and doesn't require score normalization. Good default.
  • Linear combination: Weighted sum of normalized scores (alpha * bm25_score + (1-alpha) * vector_score). Allows tuning the balance.
  • Re-ranking: Use vector search or a cross-encoder model to re-rank BM25 results. Keeps the speed of lexical search with semantic re-ordering.

Choosing an Embedding Model

The embedding model is the most critical component of a semantic search system:

Model Dimensions Speed Quality Notes
all-MiniLM-L6-v2 384 Fast Good Great starting point, low resource requirements
all-mpnet-base-v2 768 Medium Very good Better quality, higher resource cost
e5-large-v2 1024 Slow Excellent State-of-the-art for general retrieval
Domain fine-tuned Varies Varies Best for domain Fine-tune on your data for best results

Key considerations:

  • Dimension count affects storage: 768-dim vectors at 4 bytes/dim = ~3 KB per document. At 10 million documents, that's 30 GB of vector data alone.
  • Model language support: Multilingual models exist but perform worse than language-specific models on each individual language.
  • Inference latency: Every query requires an embedding model inference call. Use GPU inference or batch processing for high-throughput systems.

When to Use What

Use Case Recommended Approach
E-commerce product search Hybrid (keyword for SKUs/brands + semantic for descriptions)
Document search / knowledge base Semantic or hybrid
Log search Lexical (structured data, exact matching)
Code search Hybrid (identifiers + semantic understanding)
FAQ / support ticket matching Semantic
Autocomplete / typeahead Lexical (prefix matching)
Image similarity Vector search
Recommendation systems Vector search

Frequently Asked Questions

Q: Do I always need vector search for semantic capabilities?

No. Synonym expansion, query-time stemming, and phrase matching can add semantic-like capabilities to lexical search without vectors. But for true meaning-based retrieval across paraphrases and languages, vector search with embeddings is the most effective approach.

Q: How much does vector search add to storage and memory requirements?

Vector storage = num_documents × dimensions × 4 bytes. HNSW graph overhead adds roughly 1.5–2x on top. For 10M documents with 384-dim vectors: ~15 GB for vectors, ~25–30 GB total with HNSW graph. This is in addition to your existing inverted index storage.

Q: Can I add vector search to an existing Elasticsearch/OpenSearch cluster?

Yes. Add a dense_vector (Elasticsearch) or knn_vector (OpenSearch) field to your mapping, generate embeddings for existing documents via re-indexing or update-by-query, and start querying. You don't need to rebuild from scratch.

Q: What about learned sparse retrieval (SPLADE, etc.)?

Learned sparse models like SPLADE produce weighted sparse vectors that work with inverted indexes. They capture some semantic meaning while remaining compatible with traditional retrieval infrastructure. Elasticsearch supports this through its ELSER model. It's a middle ground between pure lexical and dense vector approaches.

Q: How do I evaluate search quality when adding semantic search?

Use standard information retrieval metrics: NDCG@10, MRR, recall@K. Build a test set of queries with known relevant documents. Compare lexical-only, vector-only, and hybrid results. Hybrid almost always wins, but the optimal weighting depends on your data.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.