Elasticsearch does not support SQL-style joins between two indices. The shard-local execution model that makes search fast also makes arbitrary joins between independently-sharded datasets impractical. The four approaches that work in practice are denormalization at ingest time, the join field for parent-child relationships within a single index, ES|QL LOOKUP JOIN (8.13+ enrich-style joins against a small lookup index), and application-side joins (two queries, joined in client code). Each has different trade-offs in storage, query latency, and update cost.
When to Use Each Join Approach
| Approach | Best for | Trade-off |
|---|---|---|
| Denormalization | Read-heavy, infrequent updates to "parent" side | Storage cost, cascade updates when parent changes |
join field (parent-child) |
Many small children per large parent, separate update cadences | Same-index only, expensive at query time, no cross-shard joins |
nested fields |
Small bounded arrays of related objects | Up to ~50 nested fields per index; reindex on update |
| ES | QL LOOKUP JOIN |
Enrich main results with a small reference index |
| Enrich processor | Static lookup data injected at ingest | Lookup data must be small and updated via enrich policy |
| Application-side join | Ad-hoc joins, low-volume queries | Two round trips per query; client code complexity |
| Terms lookup query | Filter by IDs stored in another doc | Single-document lookup only |
Denormalization at Ingest Time
The default answer for "I need to query data that lives in two indices" is to make it live in one. At ingest, embed the relevant fields of the smaller side into documents of the larger side. Read queries become single-index queries; the cost is storage and cascade updates when the embedded data changes.
PUT orders/_doc/1
{
"order_id": "ORD-100",
"customer": {
"id": "C-7",
"name": "Acme Co",
"tier": "gold"
},
"items": [
{ "sku": "ABC-123", "qty": 2 }
]
}
When the customer's tier changes, update every order. Use update_by_query with a script for bounded fan-out, or maintain a "needs reindex" queue if updates are frequent.
The join Field: Parent-Child in One Index
The join field type expresses parent-child relationships within a single index. Child documents are routed to the same shard as their parent, which is what makes has_child and has_parent queries possible. The trade-off is that joins only work within one index, all joined documents share routing, and query-time joins are 5-10x slower than equivalent denormalized queries.
PUT relationships
{
"mappings": {
"properties": {
"doc_type": {
"type": "join",
"relations": { "customer": "order" }
},
"name": { "type": "text" }
}
}
}
PUT relationships/_doc/c1?refresh
{ "name": "Acme Co", "doc_type": "customer" }
PUT relationships/_doc/o1?routing=c1&refresh
{ "order_id": "ORD-100", "doc_type": { "name": "order", "parent": "c1" } }
GET relationships/_search
{
"query": {
"has_child": {
"type": "order",
"query": { "match": { "order_id": "ORD-100" } }
}
}
}
Use join when children update independently of parents and reindexing the parent on every child change would be expensive. Otherwise prefer nested or denormalization.
ES|QL LOOKUP JOIN and Enrich Processor
ES|QL added LOOKUP JOIN in 8.13, supporting left joins between query results and a small lookup index. The lookup index must fit on every node and the join key must be a single field. Use it when you need to enrich main-index results with reference data without pre-materializing the join at ingest.
The enrich processor pre-computes the join at ingest: an enrich policy reads a source index, builds an enrich index, and the processor injects matching fields into incoming documents. This is the right pattern for static reference data (country lookups, SKU catalogs) that doesn't change in real time.
Application-Side Joins
Run two queries, join the results in client code. Acceptable for low-QPS workflows, ad-hoc analysis, and ad-hoc reporting. The cost is two network round-trips per logical query and join logic in the application.
customers = es.search(index="customers", query={"term": {"tier": "gold"}})
customer_ids = [hit["_id"] for hit in customers["hits"]["hits"]]
orders = es.search(index="orders", query={"terms": {"customer_id": customer_ids}})
Common Pitfalls With Elasticsearch Joins
- Reaching for
joinwhen denormalization would be cheaper.joinis the slowest option at query time and rarely the right first choice. - Cross-index joins. Elasticsearch has no cross-index join. ES|QL
LOOKUP JOIN, application-side joins, and the enrich processor are the substitutes; pick the one whose update model matches your data. - Using nested when you actually need parent-child. Nested re-indexes the whole parent on any child change. If children update independently, parent-child is the better fit despite higher query cost.
- Forgetting the routing requirement on
join. Children must be indexed with?routing=<parent_id>or they end up on the wrong shard andhas_child/has_parentreturns wrong results silently. - Over-using ES|QL
LOOKUP JOINagainst large lookup indices. The lookup side must stay small; otherwise per-node memory blows up.
Operating Join-Heavy Workloads
Joins of any kind in Elasticsearch are an operational risk: they shift cost from ingest to query, often raise tail latency, and tie performance to data distribution. Pulse profiles parent-child and ES|QL LOOKUP JOIN queries, surfaces queries with disproportionate cost relative to result size, and recommends the denormalization or alias change that converts them into single-index searches.
Frequently Asked Questions
Q: Can I do a SQL-style JOIN between two Elasticsearch indices?
A: No. Elasticsearch has no general cross-index join. The substitutes are denormalization (do the join at ingest), the join field (parent-child within one index), ES|QL LOOKUP JOIN (left joins against a small lookup index), the enrich processor (pre-materialize at ingest), and application-side joins (two queries, joined in client code).
Q: When should I use the join field type versus nested fields?
A: Use nested for arrays of related objects that update together with their parent (order with line items, post with comments-fetched-at-the-same-time). Use join (parent-child) when children update independently and reindexing the parent on every child change would be expensive. join is much slower at query time, so default to nested.
Q: How does ES|QL LOOKUP JOIN work in Elasticsearch?
A: LOOKUP JOIN (8.13+) performs a left join between the main ES|QL query and a small lookup index keyed by a single field. The lookup index must be small enough to fit on every node. Use it to enrich main results with reference data at query time, when pre-materializing via the enrich processor isn't feasible.
Q: What is denormalization in Elasticsearch and when should I use it?
A: Denormalization is the practice of embedding fields from related entities into the same document at ingest time, so reads are single-index queries with no join. Use it when reads are frequent and updates to the embedded entity are rare. The cost is storage and cascade-update logic when embedded data changes.
Q: Can I use the terms query as a join workaround?
A: The Terms Lookup feature of the terms query fetches the term values from another document and uses them as filter values. It is a single-document lookup, not a join, but it covers the case "filter index A by IDs stored in document X of index B" without an application round trip.
Q: How can I prevent slow joins from impacting overall cluster performance?
A: Use search slow logs to find joins exceeding latency budgets, set search.max_buckets and per-shard timeouts to bound worst-case cost, and consider isolating join-heavy queries to dedicated coordinating nodes. The structural fix is usually to denormalize the data and remove the join from the hot path.
Related Reading
- Elasticsearch cross-index query: the simpler multi-index case that doesn't require joining.
- Elasticsearch nested field data type: the right fit for tightly-coupled inner arrays.
- Elasticsearch IllegalArgumentException: mapper conflicts: the typing error that often appears when joining mismatched indices.
- Elasticsearch index.mapping.total_fields.limit: denormalization can push you past this cap.
- Elasticsearch flattened field data type: cheap way to embed dynamic sub-trees on denormalized documents.
- Elasticsearch create index with mapping: set up
joinand parent-child mappings up front.