NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Elasticsearch Upsert: How to Update or Insert a Document

An Elasticsearch upsert updates a document if it exists or inserts it if it does not, in a single API call. The operation is exposed by the _update API with an upsert body, or as doc_as_upsert: true for simple document merging. Upsert is the right primitive for keeping a denormalized view in sync with an external source of truth where you do not know in advance whether a document is new or existing.

When to Use Upsert (vs Alternatives)

Goal Better choice
Insert a brand-new document with a known ID PUT /index/_doc/<id> or POST /index/_create/<id>
Replace an entire document, always PUT /index/_doc/<id> - overwrites if present, creates if not
Update fields only if the document exists, fail otherwise POST /index/_update/<id> without upsert
Update fields if exists, create if not _update with upsert or doc_as_upsert: true (this guide)
Update many documents matching a query `_update_by_query`
High-volume insert-or-update from a pipeline _bulk with update + doc_as_upsert

Prerequisites

  • Elasticsearch 6.x or later.
  • The user or API key needs index and update privileges on the target index.
  • Painless scripting enabled if you use scripted upserts (default on most distributions).

Step-by-Step: Upsert a Document

  1. Upsert with doc and upsert (the most common form). The doc is applied as a partial update if the document exists; the upsert body is used if it does not.

    POST /products/_update/sku-1234
    {
      "doc":    { "price": 19.99, "in_stock": true },
      "upsert": { "sku": "sku-1234", "name": "Widget", "price": 19.99, "in_stock": true }
    }
    
  2. Use doc_as_upsert: true when the new and update bodies are identical. Avoids duplicating fields.

    POST /products/_update/sku-1234
    {
      "doc": { "sku": "sku-1234", "price": 19.99, "in_stock": true },
      "doc_as_upsert": true
    }
    
  3. Use a script for conditional logic. The script runs only when the document exists; the upsert body is used otherwise.

    POST /counters/_update/page-views
    {
      "script": {
        "source": "ctx._source.count += params.delta",
        "lang": "painless",
        "params": { "delta": 1 }
      },
      "upsert": { "count": 1 }
    }
    
  4. Use scripted_upsert: true to run the script on both insert and update. Useful when the initial creation logic is non-trivial.

    POST /counters/_update/page-views
    {
      "scripted_upsert": true,
      "script": {
        "source": "if (ctx.op == 'create') { ctx._source.count = 1 } else { ctx._source.count += 1 }",
        "lang": "painless"
      },
      "upsert": {}
    }
    
  5. Add retry_on_conflict for concurrent updates. When two clients upsert the same ID simultaneously, version conflicts are normal. Retries are cheap and automatic.

    POST /counters/_update/page-views?retry_on_conflict=5
    { "script": { ... }, "upsert": { ... } }
    
  6. Use the Bulk API for high-throughput upserts. One round trip carries many upserts.

    POST /_bulk
    { "update": { "_index": "products", "_id": "sku-1234", "retry_on_conflict": 3 } }
    { "doc": { "price": 19.99 }, "doc_as_upsert": true }
    { "update": { "_index": "products", "_id": "sku-5678", "retry_on_conflict": 3 } }
    { "doc": { "price": 24.99 }, "doc_as_upsert": true }
    
  7. Optionally control concurrency with if_seq_no and if_primary_term. Use these when you have a read-modify-write workflow and need a fail-on-conflict semantic instead of last-writer-wins.

Upsert in Production: What to Watch For

The hidden cost of upsert workloads is version conflicts. Every _update is a get-then-write under the hood; concurrent writers to the same ID will collide. The retry_on_conflict parameter handles the common case, but on hot keys (counters, per-user aggregates) you can burn significant CPU and IO on retries. Watch the version_conflicts counter in _nodes/stats/indices/indexing and consider sharding the hot key or batching upstream.

Bulk upserts also generate a lot of segment churn because each update produces a new document version. On indices with a high upsert rate, refresh and merge pressure go up and search latency tracks with merge IO.

Run High-Volume Upserts Safely with Pulse

Pulse is an AI DBA for Elasticsearch and OpenSearch. Before and during bulk upsert workloads, Pulse:

  • Verifies cluster capacity for the operation: write thread pool budget, merge throughput headroom, heap for concurrent get-then-write cycles
  • Surfaces hot-key patterns where one document ID receives most upserts and retry_on_conflict is burning CPU
  • Tracks the operation's impact on production traffic in real time: version_conflicts rate, indexing latency p95/p99, segment merge backlog, search latency
  • Recommends sharding the hot key, batching upstream, raising refresh_interval during the load, or lowering bulk concurrency when latency starts climbing

Start a free trial before your next high-volume upsert load.

Common Mistakes

  1. Confusing doc with upsert. doc is the partial update applied when the document exists. upsert is the full document inserted when it does not. Both are usually needed.
  2. Forgetting retry_on_conflict under concurrency. Without it, the first concurrent writer wins and the rest get 409 errors.
  3. Using upsert for whole-document replacement. A PUT /index/_doc/<id> is simpler and cheaper.
  4. Heavy scripts in hot upsert paths. Painless is fast but not free. Move logic upstream when you can.
  5. Indexing into the same ID at very high rates. No amount of retry_on_conflict saves you from a hot shard. Reshape the data model.
  6. Disabling _source on indices that get upserted. Upsert needs _source to read the existing document. Without it, scripted upserts fail.

Frequently Asked Questions

Q: What is the difference between update and upsert in Elasticsearch?
A: An _update request without upsert modifies an existing document and fails with a 404 if the document is not found. An upsert (_update with an upsert body, or doc_as_upsert: true) modifies the document if it exists or creates it if it does not. Upsert is the right tool when you do not know in advance whether the document is new.

Q: When should I use doc_as_upsert vs an explicit upsert block?
A: Use doc_as_upsert: true when the partial update body and the insert body are the same (typical for simple "sync from source" pipelines). Use the explicit upsert block when the insert needs defaults or computed fields the update does not touch.

Q: How does scripted_upsert differ from a normal scripted update with an upsert?
A: With a normal scripted update, the script runs only when the document already exists; the plain upsert body is used on insert. With scripted_upsert: true, the script runs in both cases, with ctx.op set to create on insert and index on update. Use it when the initial-creation logic is non-trivial.

Q: How do I do high-throughput upserts efficiently?
A: Use the Bulk API with update actions and doc_as_upsert: true. Batch a few hundred to a few thousand operations per bulk request, set retry_on_conflict to 3-5, and tune refresh_interval higher during the load to reduce segment churn.

Q: Does upsert work with nested or object fields?
A: Yes. Nested objects work as long as the mapping supports them. For deep partial updates, you can either pass a full sub-object in doc (which replaces the whole sub-object) or use a Painless script for surgical changes inside the nested structure.

Q: What does retry_on_conflict do for upsert operations?
A: When two clients upsert the same document concurrently, the second one sees a version conflict because the first has incremented _version since its read. retry_on_conflict=N tells Elasticsearch to retry the read-modify-write cycle up to N times before returning an error. Five is a reasonable default for typical workloads.

Q: How do I prevent the upsert path from creating unexpected new documents?
A: If you want a fail-if-missing semantic instead, omit the upsert body (and doc_as_upsert). The request will then return a 404 when the document does not exist, which is the safer behavior for idempotent updates of known IDs.

Q: What's the best tool to monitor and safely run Elasticsearch upsert workloads?
A: Pulse is purpose-built for this. It is an AI DBA for Elasticsearch and OpenSearch that tracks version_conflicts, indexing latency percentiles, merge backlog, and hot-key concentration during bulk upsert loads, and recommends specific fixes - shard the hot key, raise refresh_interval, lower bulk concurrency - before search latency degrades.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.