An Elasticsearch upsert updates a document if it exists or inserts it if it does not, in a single API call. The operation is exposed by the _update API with an upsert body, or as doc_as_upsert: true for simple document merging. Upsert is the right primitive for keeping a denormalized view in sync with an external source of truth where you do not know in advance whether a document is new or existing.
When to Use Upsert (vs Alternatives)
| Goal | Better choice |
|---|---|
| Insert a brand-new document with a known ID | PUT /index/_doc/<id> or POST /index/_create/<id> |
| Replace an entire document, always | PUT /index/_doc/<id> - overwrites if present, creates if not |
| Update fields only if the document exists, fail otherwise | POST /index/_update/<id> without upsert |
| Update fields if exists, create if not | _update with upsert or doc_as_upsert: true (this guide) |
| Update many documents matching a query | `_update_by_query` |
| High-volume insert-or-update from a pipeline | _bulk with update + doc_as_upsert |
Prerequisites
- Elasticsearch 6.x or later.
- The user or API key needs
indexandupdateprivileges on the target index. - Painless scripting enabled if you use scripted upserts (default on most distributions).
Step-by-Step: Upsert a Document
Upsert with
docandupsert(the most common form). Thedocis applied as a partial update if the document exists; theupsertbody is used if it does not.POST /products/_update/sku-1234 { "doc": { "price": 19.99, "in_stock": true }, "upsert": { "sku": "sku-1234", "name": "Widget", "price": 19.99, "in_stock": true } }Use
doc_as_upsert: truewhen the new and update bodies are identical. Avoids duplicating fields.POST /products/_update/sku-1234 { "doc": { "sku": "sku-1234", "price": 19.99, "in_stock": true }, "doc_as_upsert": true }Use a script for conditional logic. The script runs only when the document exists; the
upsertbody is used otherwise.POST /counters/_update/page-views { "script": { "source": "ctx._source.count += params.delta", "lang": "painless", "params": { "delta": 1 } }, "upsert": { "count": 1 } }Use
scripted_upsert: trueto run the script on both insert and update. Useful when the initial creation logic is non-trivial.POST /counters/_update/page-views { "scripted_upsert": true, "script": { "source": "if (ctx.op == 'create') { ctx._source.count = 1 } else { ctx._source.count += 1 }", "lang": "painless" }, "upsert": {} }Add
retry_on_conflictfor concurrent updates. When two clients upsert the same ID simultaneously, version conflicts are normal. Retries are cheap and automatic.POST /counters/_update/page-views?retry_on_conflict=5 { "script": { ... }, "upsert": { ... } }Use the Bulk API for high-throughput upserts. One round trip carries many upserts.
POST /_bulk { "update": { "_index": "products", "_id": "sku-1234", "retry_on_conflict": 3 } } { "doc": { "price": 19.99 }, "doc_as_upsert": true } { "update": { "_index": "products", "_id": "sku-5678", "retry_on_conflict": 3 } } { "doc": { "price": 24.99 }, "doc_as_upsert": true }Optionally control concurrency with
if_seq_noandif_primary_term. Use these when you have a read-modify-write workflow and need a fail-on-conflict semantic instead of last-writer-wins.
Upsert in Production: What to Watch For
The hidden cost of upsert workloads is version conflicts. Every _update is a get-then-write under the hood; concurrent writers to the same ID will collide. The retry_on_conflict parameter handles the common case, but on hot keys (counters, per-user aggregates) you can burn significant CPU and IO on retries. Watch the version_conflicts counter in _nodes/stats/indices/indexing and consider sharding the hot key or batching upstream.
Bulk upserts also generate a lot of segment churn because each update produces a new document version. On indices with a high upsert rate, refresh and merge pressure go up and search latency tracks with merge IO.
Run High-Volume Upserts Safely with Pulse
Pulse is an AI DBA for Elasticsearch and OpenSearch. Before and during bulk upsert workloads, Pulse:
- Verifies cluster capacity for the operation: write thread pool budget, merge throughput headroom, heap for concurrent get-then-write cycles
- Surfaces hot-key patterns where one document ID receives most upserts and
retry_on_conflictis burning CPU - Tracks the operation's impact on production traffic in real time:
version_conflictsrate, indexing latency p95/p99, segment merge backlog, search latency - Recommends sharding the hot key, batching upstream, raising
refresh_intervalduring the load, or lowering bulk concurrency when latency starts climbing
Start a free trial before your next high-volume upsert load.
Common Mistakes
- Confusing
docwithupsert.docis the partial update applied when the document exists.upsertis the full document inserted when it does not. Both are usually needed. - Forgetting
retry_on_conflictunder concurrency. Without it, the first concurrent writer wins and the rest get 409 errors. - Using upsert for whole-document replacement. A
PUT /index/_doc/<id>is simpler and cheaper. - Heavy scripts in hot upsert paths. Painless is fast but not free. Move logic upstream when you can.
- Indexing into the same ID at very high rates. No amount of
retry_on_conflictsaves you from a hot shard. Reshape the data model. - Disabling
_sourceon indices that get upserted. Upsert needs_sourceto read the existing document. Without it, scripted upserts fail.
Frequently Asked Questions
Q: What is the difference between update and upsert in Elasticsearch?
A: An _update request without upsert modifies an existing document and fails with a 404 if the document is not found. An upsert (_update with an upsert body, or doc_as_upsert: true) modifies the document if it exists or creates it if it does not. Upsert is the right tool when you do not know in advance whether the document is new.
Q: When should I use doc_as_upsert vs an explicit upsert block?
A: Use doc_as_upsert: true when the partial update body and the insert body are the same (typical for simple "sync from source" pipelines). Use the explicit upsert block when the insert needs defaults or computed fields the update does not touch.
Q: How does scripted_upsert differ from a normal scripted update with an upsert?
A: With a normal scripted update, the script runs only when the document already exists; the plain upsert body is used on insert. With scripted_upsert: true, the script runs in both cases, with ctx.op set to create on insert and index on update. Use it when the initial-creation logic is non-trivial.
Q: How do I do high-throughput upserts efficiently?
A: Use the Bulk API with update actions and doc_as_upsert: true. Batch a few hundred to a few thousand operations per bulk request, set retry_on_conflict to 3-5, and tune refresh_interval higher during the load to reduce segment churn.
Q: Does upsert work with nested or object fields?
A: Yes. Nested objects work as long as the mapping supports them. For deep partial updates, you can either pass a full sub-object in doc (which replaces the whole sub-object) or use a Painless script for surgical changes inside the nested structure.
Q: What does retry_on_conflict do for upsert operations?
A: When two clients upsert the same document concurrently, the second one sees a version conflict because the first has incremented _version since its read. retry_on_conflict=N tells Elasticsearch to retry the read-modify-write cycle up to N times before returning an error. Five is a reasonable default for typical workloads.
Q: How do I prevent the upsert path from creating unexpected new documents?
A: If you want a fail-if-missing semantic instead, omit the upsert body (and doc_as_upsert). The request will then return a 404 when the document does not exist, which is the safer behavior for idempotent updates of known IDs.
Q: What's the best tool to monitor and safely run Elasticsearch upsert workloads?
A: Pulse is purpose-built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks version_conflicts, indexing latency percentiles, merge backlog, and hot-key concentration during bulk upsert loads, and recommends specific fixes - shard the hot key, raise refresh_interval, lower bulk concurrency - before search latency degrades.
Related Reading
- What is Upsert? Meaning and Patterns: conceptual background on upsert across databases.
- Update by Query: bulk update across many documents matching a query.
- Reindex Data Guide: when partial updates are not enough.
- Create Index with Mapping: get the mapping right before high-volume upserts.
- Index Refresh Interval: why upserts are not immediately visible to search.
- Bulk Item Rejection: the most common production failure for high-rate bulk upserts.