The terms "semantic search" and "vector search" are often used interchangeably, but they refer to different layers of the same problem. Understanding the distinction helps you make better architectural decisions when building search systems.
Definitions
Vector Search
Vector search is a retrieval mechanism. It finds the nearest neighbors to a query vector in a high-dimensional vector space using distance metrics (cosine similarity, dot product, L2/Euclidean distance).
At its core, vector search is a mathematical operation: given a query vector and a set of stored vectors, return the top-K most similar vectors.
Vector search doesn't know or care what the vectors represent. They could be text embeddings, image features, audio fingerprints, or user behavior profiles.
Semantic Search
Semantic search is an application-level concept. It means retrieving results based on the meaning of a query rather than exact keyword matching. A semantic search for "how to fix a leaky faucet" should return results about "plumbing repair" and "dripping tap" even if those exact words don't appear in the query.
Semantic search is typically implemented using vector search (with text embeddings), but it's not the only way. It can also involve:
- Query expansion (adding synonyms and related terms)
- Knowledge graphs (understanding entity relationships)
- Learned sparse representations (models like SPLADE that produce weighted term vectors)
The Relationship
Semantic Search → uses → Vector Search → operates on → Embeddings
(application goal) (retrieval mechanism) (data representation)
Vector search is a tool. Semantic search is what you build with it (for text). You can use vector search for non-semantic purposes (image similarity, recommendation systems, anomaly detection), and you can achieve some degree of semantic search without vectors (synonym expansion, stemming).
How Vector Search Works
Embedding Generation
Text is converted to dense vectors using embedding models:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
# Each text becomes a 384-dimensional vector
query_vector = model.encode("how to fix a leaky faucet")
doc_vector = model.encode("plumbing repair guide for dripping taps")
The embedding model positions semantically similar texts close together in vector space. "Fix a leaky faucet" and "repair a dripping tap" produce vectors that are near each other, even though they share almost no words.
Approximate Nearest Neighbor (ANN) Algorithms
Exact nearest-neighbor search over millions of vectors is too slow. Production systems use ANN algorithms that trade small accuracy loss for dramatic speed improvements:
- HNSW (Hierarchical Navigable Small World): Graph-based. The most popular choice for accuracy and speed balance. Used by OpenSearch, Elasticsearch, pgvector.
- IVF (Inverted File Index): Partition-based. Clusters vectors, then searches only relevant clusters. Lower memory than HNSW.
- Product Quantization (PQ): Compresses vectors for lower memory footprint at the cost of accuracy.
Vector Search in OpenSearch
# Create index with knn_vector field
PUT /documents
{
"settings": { "index.knn": true },
"mappings": {
"properties": {
"title": { "type": "text" },
"embedding": {
"type": "knn_vector",
"dimension": 384,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib"
}
}
}
}
}
# Query
POST /documents/_search
{
"query": {
"knn": {
"embedding": {
"vector": [0.1, 0.2, ...],
"k": 10
}
}
}
}
Vector Search in Elasticsearch
# Create index with dense_vector field
PUT /documents
{
"mappings": {
"properties": {
"title": { "type": "text" },
"embedding": {
"type": "dense_vector",
"dims": 384,
"index": true,
"similarity": "cosine"
}
}
}
}
# Query
POST /documents/_search
{
"knn": {
"field": "embedding",
"query_vector": [0.1, 0.2, ...],
"k": 10,
"num_candidates": 100
}
}
Lexical Search vs. Semantic Search
Traditional keyword (lexical) search and semantic search have complementary strengths:
| Aspect | Lexical Search (BM25) | Semantic Search (Vectors) |
|---|---|---|
| How it works | Term frequency and inverse document frequency | Vector similarity in embedding space |
| Handles synonyms | Only with explicit synonym configuration | Naturally — embeddings capture meaning |
| Exact match | Excellent — finds precise terms | Poor — may miss exact identifiers, SKUs, codes |
| Typo tolerance | Requires fuzzy matching configuration | Moderate — small typos often produce similar embeddings |
| Out-of-vocabulary terms | Fails on terms not in the index | Handles novel phrasings if meaning is similar |
| Domain-specific jargon | Works well if jargon appears in documents | May fail if embedding model wasn't trained on domain |
| Scoring transparency | BM25 scores are interpretable | Embedding similarity scores are opaque |
| Performance | Very fast (inverted index) | Slower (ANN search), more memory |
| Index size | Compact inverted index | Large vector storage (dims × 4 bytes × docs) |
Neither approach is universally better. Most production search systems benefit from combining both — this is called hybrid search.
Hybrid Search: Combining Both
Hybrid search runs lexical and vector search in parallel, then fuses the results:
# OpenSearch hybrid search with neural search plugin
POST /documents/_search
{
"query": {
"hybrid": {
"queries": [
{
"match": {
"title": "leaky faucet repair"
}
},
{
"knn": {
"embedding": {
"vector": [0.1, 0.2, ...],
"k": 10
}
}
}
]
}
}
}
In Elasticsearch, use Reciprocal Rank Fusion (RRF) to combine results:
POST /documents/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": { "match": { "title": "leaky faucet repair" } }
}
},
{
"knn": {
"field": "embedding",
"query_vector": [0.1, 0.2, ...],
"k": 10,
"num_candidates": 100
}
}
]
}
}
}
Score Fusion Strategies
- Reciprocal Rank Fusion (RRF): Combines result rankings rather than scores. Simple, robust, and doesn't require score normalization. Good default.
- Linear combination: Weighted sum of normalized scores (
alpha * bm25_score + (1-alpha) * vector_score). Allows tuning the balance. - Re-ranking: Use vector search or a cross-encoder model to re-rank BM25 results. Keeps the speed of lexical search with semantic re-ordering.
Choosing an Embedding Model
The embedding model is the most critical component of a semantic search system:
| Model | Dimensions | Speed | Quality | Notes |
|---|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | Fast | Good | Great starting point, low resource requirements |
| all-mpnet-base-v2 | 768 | Medium | Very good | Better quality, higher resource cost |
| e5-large-v2 | 1024 | Slow | Excellent | State-of-the-art for general retrieval |
| Domain fine-tuned | Varies | Varies | Best for domain | Fine-tune on your data for best results |
Key considerations:
- Dimension count affects storage: 768-dim vectors at 4 bytes/dim = ~3 KB per document. At 10 million documents, that's 30 GB of vector data alone.
- Model language support: Multilingual models exist but perform worse than language-specific models on each individual language.
- Inference latency: Every query requires an embedding model inference call. Use GPU inference or batch processing for high-throughput systems.
When to Use What
| Use Case | Recommended Approach |
|---|---|
| E-commerce product search | Hybrid (keyword for SKUs/brands + semantic for descriptions) |
| Document search / knowledge base | Semantic or hybrid |
| Log search | Lexical (structured data, exact matching) |
| Code search | Hybrid (identifiers + semantic understanding) |
| FAQ / support ticket matching | Semantic |
| Autocomplete / typeahead | Lexical (prefix matching) |
| Image similarity | Vector search |
| Recommendation systems | Vector search |
Frequently Asked Questions
Q: Do I always need vector search for semantic capabilities?
No. Synonym expansion, query-time stemming, and phrase matching can add semantic-like capabilities to lexical search without vectors. But for true meaning-based retrieval across paraphrases and languages, vector search with embeddings is the most effective approach.
Q: How much does vector search add to storage and memory requirements?
Vector storage = num_documents × dimensions × 4 bytes. HNSW graph overhead adds roughly 1.5–2x on top. For 10M documents with 384-dim vectors: ~15 GB for vectors, ~25–30 GB total with HNSW graph. This is in addition to your existing inverted index storage.
Q: Can I add vector search to an existing Elasticsearch/OpenSearch cluster?
Yes. Add a dense_vector (Elasticsearch) or knn_vector (OpenSearch) field to your mapping, generate embeddings for existing documents via re-indexing or update-by-query, and start querying. You don't need to rebuild from scratch.
Q: What about learned sparse retrieval (SPLADE, etc.)?
Learned sparse models like SPLADE produce weighted sparse vectors that work with inverted indexes. They capture some semantic meaning while remaining compatible with traditional retrieval infrastructure. Elasticsearch supports this through its ELSER model. It's a middle ground between pure lexical and dense vector approaches.
Q: How do I evaluate search quality when adding semantic search?
Use standard information retrieval metrics: NDCG@10, MRR, recall@K. Build a test set of queries with known relevant documents. Compare lexical-only, vector-only, and hybrid results. Hybrid almost always wins, but the optimal weighting depends on your data.