Semantic Search vs Vector Search: Understanding the Difference

The terms "semantic search" and "vector search" are often used interchangeably, but they refer to different layers of the same problem. Understanding the distinction helps you make better architectural decisions when building search systems.

Definitions

Vector Search

Vector search is a retrieval mechanism. It finds the nearest neighbors to a query vector in a high-dimensional vector space using distance metrics (cosine similarity, dot product, L2/Euclidean distance).

At its core, vector search is a mathematical operation: given a query vector and a set of stored vectors, return the top-K most similar vectors.

Vector search doesn't know or care what the vectors represent. They could be text embeddings, image features, audio fingerprints, or user behavior profiles.

Semantic Search

Semantic search is an application-level concept. It means retrieving results based on the meaning of a query rather than exact keyword matching. A semantic search for "how to fix a leaky faucet" should return results about "plumbing repair" and "dripping tap" even if those exact words don't appear in the query.

Semantic search is typically implemented using vector search (with text embeddings), but it's not the only way. It can also involve:

Query expansion (adding synonyms and related terms)
Knowledge graphs (understanding entity relationships)
Learned sparse representations (models like SPLADE that produce weighted term vectors)

The Relationship

Semantic Search  →  uses  →  Vector Search  →  operates on  →  Embeddings
(application goal)           (retrieval mechanism)               (data representation)

Vector search is a tool. Semantic search is what you build with it (for text). You can use vector search for non-semantic purposes (image similarity, recommendation systems, anomaly detection), and you can achieve some degree of semantic search without vectors (synonym expansion, stemming).

How Vector Search Works

Embedding Generation

Text is converted to dense vectors using embedding models:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')

# Each text becomes a 384-dimensional vector
query_vector = model.encode("how to fix a leaky faucet")
doc_vector = model.encode("plumbing repair guide for dripping taps")

The embedding model positions semantically similar texts close together in vector space. "Fix a leaky faucet" and "repair a dripping tap" produce vectors that are near each other, even though they share almost no words.

Approximate Nearest Neighbor (ANN) Algorithms

Exact nearest-neighbor search over millions of vectors is too slow. Production systems use ANN algorithms that trade small accuracy loss for dramatic speed improvements:

HNSW (Hierarchical Navigable Small World): Graph-based. The most popular choice for accuracy and speed balance. Used by OpenSearch, Elasticsearch, pgvector.
IVF (Inverted File Index): Partition-based. Clusters vectors, then searches only relevant clusters. Lower memory than HNSW.
Product Quantization (PQ): Compresses vectors for lower memory footprint at the cost of accuracy.

Vector Search in OpenSearch

# Create index with knn_vector field
PUT /documents
{
  "settings": { "index.knn": true },
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "embedding": {
        "type": "knn_vector",
        "dimension": 384,
        "method": {
          "name": "hnsw",
          "space_type": "cosinesimil",
          "engine": "nmslib"
        }
      }
    }
  }
}

# Query
POST /documents/_search
{
  "query": {
    "knn": {
      "embedding": {
        "vector": [0.1, 0.2, ...],
        "k": 10
      }
    }
  }
}

Vector Search in Elasticsearch

# Create index with dense_vector field
PUT /documents
{
  "mappings": {
    "properties": {
      "title": { "type": "text" },
      "embedding": {
        "type": "dense_vector",
        "dims": 384,
        "index": true,
        "similarity": "cosine"
      }
    }
  }
}

# Query
POST /documents/_search
{
  "knn": {
    "field": "embedding",
    "query_vector": [0.1, 0.2, ...],
    "k": 10,
    "num_candidates": 100
  }
}

Lexical Search vs. Semantic Search

Traditional keyword (lexical) search and semantic search have complementary strengths:

Aspect	Lexical Search (BM25)	Semantic Search (Vectors)
How it works	Term frequency and inverse document frequency	Vector similarity in embedding space
Handles synonyms	Only with explicit synonym configuration	Naturally — embeddings capture meaning
Exact match	Excellent — finds precise terms	Poor — may miss exact identifiers, SKUs, codes
Typo tolerance	Requires fuzzy matching configuration	Moderate — small typos often produce similar embeddings
Out-of-vocabulary terms	Fails on terms not in the index	Handles novel phrasings if meaning is similar
Domain-specific jargon	Works well if jargon appears in documents	May fail if embedding model wasn't trained on domain
Scoring transparency	BM25 scores are interpretable	Embedding similarity scores are opaque
Performance	Very fast (inverted index)	Slower (ANN search), more memory
Index size	Compact inverted index	Large vector storage (dims × 4 bytes × docs)

Neither approach is universally better. Most production search systems benefit from combining both — this is called hybrid search.

Hybrid Search: Combining Both

Hybrid search runs lexical and vector search in parallel, then fuses the results:

# OpenSearch hybrid search with neural search plugin
POST /documents/_search
{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "title": "leaky faucet repair"
          }
        },
        {
          "knn": {
            "embedding": {
              "vector": [0.1, 0.2, ...],
              "k": 10
            }
          }
        }
      ]
    }
  }
}

In Elasticsearch, use Reciprocal Rank Fusion (RRF) to combine results:

POST /documents/_search
{
  "retriever": {
    "rrf": {
      "retrievers": [
        {
          "standard": {
            "query": { "match": { "title": "leaky faucet repair" } }
          }
        },
        {
          "knn": {
            "field": "embedding",
            "query_vector": [0.1, 0.2, ...],
            "k": 10,
            "num_candidates": 100
          }
        }
      ]
    }
  }
}

Score Fusion Strategies

Reciprocal Rank Fusion (RRF): Combines result rankings rather than scores. Simple, robust, and doesn't require score normalization. Good default.
Linear combination: Weighted sum of normalized scores (alpha * bm25_score + (1-alpha) * vector_score). Allows tuning the balance.
Re-ranking: Use vector search or a cross-encoder model to re-rank BM25 results. Keeps the speed of lexical search with semantic re-ordering.

Choosing an Embedding Model

The embedding model is the most critical component of a semantic search system:

Model	Dimensions	Speed	Quality	Notes
all-MiniLM-L6-v2	384	Fast	Good	Great starting point, low resource requirements
all-mpnet-base-v2	768	Medium	Very good	Better quality, higher resource cost
e5-large-v2	1024	Slow	Excellent	State-of-the-art for general retrieval
Domain fine-tuned	Varies	Varies	Best for domain	Fine-tune on your data for best results

Key considerations:

Dimension count affects storage: 768-dim vectors at 4 bytes/dim = ~3 KB per document. At 10 million documents, that's 30 GB of vector data alone.
Model language support: Multilingual models exist but perform worse than language-specific models on each individual language.
Inference latency: Every query requires an embedding model inference call. Use GPU inference or batch processing for high-throughput systems.

When to Use What

Use Case	Recommended Approach
E-commerce product search	Hybrid (keyword for SKUs/brands + semantic for descriptions)
Document search / knowledge base	Semantic or hybrid
Log search	Lexical (structured data, exact matching)
Code search	Hybrid (identifiers + semantic understanding)
FAQ / support ticket matching	Semantic
Autocomplete / typeahead	Lexical (prefix matching)
Image similarity	Vector search
Recommendation systems	Vector search

Frequently Asked Questions

Q: Do I always need vector search for semantic capabilities?

No. Synonym expansion, query-time stemming, and phrase matching can add semantic-like capabilities to lexical search without vectors. But for true meaning-based retrieval across paraphrases and languages, vector search with embeddings is the most effective approach.

Q: How much does vector search add to storage and memory requirements?

Vector storage = num_documents × dimensions × 4 bytes. HNSW graph overhead adds roughly 1.5–2x on top. For 10M documents with 384-dim vectors: ~15 GB for vectors, ~25–30 GB total with HNSW graph. This is in addition to your existing inverted index storage.

Q: Can I add vector search to an existing Elasticsearch/OpenSearch cluster?

Yes. Add a dense_vector (Elasticsearch) or knn_vector (OpenSearch) field to your mapping, generate embeddings for existing documents via re-indexing or update-by-query, and start querying. You don't need to rebuild from scratch.

Q: What about learned sparse retrieval (SPLADE, etc.)?

Learned sparse models like SPLADE produce weighted sparse vectors that work with inverted indexes. They capture some semantic meaning while remaining compatible with traditional retrieval infrastructure. Elasticsearch supports this through its ELSER model. It's a middle ground between pure lexical and dense vector approaches.

Q: How do I evaluate search quality when adding semantic search?

Use standard information retrieval metrics: NDCG@10, MRR, recall@K. Build a test set of queries with known relevant documents. Compare lexical-only, vector-only, and hybrid results. Hybrid almost always wins, but the optimal weighting depends on your data.