NEW

Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

What Does Upsert Mean? Definition, SQL Syntax, and Examples

Upsert is a portmanteau of "update" and "insert." It's a single operation that inserts a new row if no matching row exists, or updates the existing row if one does. The match is determined by a primary key, unique constraint, or document ID, depending on the database.

Without upsert you'd have to write the defensive version: try INSERT, catch the duplicate-key error, then issue an UPDATE. That's fine in single-user scripts but breaks under concurrency. Two processes both run the INSERT, both see no conflict, both try to write, and one fails with a race condition. Upsert makes the whole thing atomic, which is why every modern database supports some form of it.

Upsert in PostgreSQL: INSERT ... ON CONFLICT

PostgreSQL's preferred upsert syntax has been INSERT ... ON CONFLICT since version 9.5. It's the syntax most experienced PostgreSQL developers reach for first, because it's atomic and concurrency-safe.

INSERT INTO users (id, email, last_seen)
VALUES (42, 'alice@example.com', now())
ON CONFLICT (id) DO UPDATE
  SET email     = EXCLUDED.email,
      last_seen = EXCLUDED.last_seen;

EXCLUDED is a magic alias for the row that would have been inserted. It lets you reference the new values inside the UPDATE clause. Without EXCLUDED, you'd have no way to bring the proposed insert values forward into the update.

You can also DO NOTHING to silently skip duplicates:

INSERT INTO audit_log (event_id, payload)
VALUES ('evt-7e3', '{...}')
ON CONFLICT (event_id) DO NOTHING;

This pattern is common for idempotent writes: replaying a Kafka topic, retrying after a network blip, or de-duplicating a batch import.

MERGE (PostgreSQL 15+)

PostgreSQL 15 added MERGE, the SQL standard's answer to upsert. MERGE is more flexible than ON CONFLICT because a single statement can express delete-on-match and conditional update logic:

MERGE INTO users AS u
USING incoming_users AS i ON u.id = i.id
WHEN MATCHED THEN UPDATE SET email = i.email
WHEN NOT MATCHED THEN INSERT (id, email) VALUES (i.id, i.email);

The catch: MERGE is not concurrency-safe in PostgreSQL the same way ON CONFLICT is. Under concurrent writes, two MERGE statements can both decide a row doesn't match and both try to insert, hitting a unique-constraint violation. For high-concurrency idempotent writes, prefer ON CONFLICT. Reach for MERGE when you have batch operations on stable data, or when you need its delete-on-match semantics.

Upsert in MySQL: INSERT ... ON DUPLICATE KEY UPDATE

MySQL has supported upsert since version 4.1 via ON DUPLICATE KEY UPDATE:

INSERT INTO users (id, email, last_seen)
VALUES (42, 'alice@example.com', NOW())
ON DUPLICATE KEY UPDATE
  email     = VALUES(email),
  last_seen = VALUES(last_seen);

Note on syntax: VALUES(col) was the original way to reference the proposed insert values inside the UPDATE clause. Starting in MySQL 8.0.19, you can use a row alias instead, and from 8.0.20 onward VALUES() is officially deprecated. The modern form looks like:

INSERT INTO users (id, email, last_seen)
VALUES (42, 'alice@example.com', NOW()) AS new
ON DUPLICATE KEY UPDATE
  email     = new.email,
  last_seen = new.last_seen;

A few gotchas with ON DUPLICATE KEY UPDATE:

  • It triggers on any unique-key conflict, not just the primary key. Tables with multiple unique constraints can update unexpectedly.
  • It locks the conflicting row even when the update doesn't change any values, which adds latency under heavy write contention.
  • The auto-increment counter advances on every attempted insert, even when the upsert ends up doing an update. Heavy upsert workloads leave gaps in your IDs.

MySQL also has REPLACE INTO, which has been around since the early 4.x days. Despite the name, REPLACE does a delete-then-insert rather than an in-place update, so it loses any data in columns the new row doesn't specify. Avoid it unless that's exactly what you want.

Upsert in SQL Server: MERGE

SQL Server has had MERGE since 2008:

MERGE INTO users AS target
USING (VALUES (42, 'alice@example.com')) AS source(id, email)
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET email = source.email
WHEN NOT MATCHED THEN INSERT (id, email) VALUES (source.id, source.email);

MERGE in SQL Server has a long list of known concurrency hazards, well-documented by Aaron Bertrand and others. For high-concurrency upserts, the community-preferred approach is still the explicit transactional version with the right hint flags:

BEGIN TRANSACTION;
  IF EXISTS (SELECT 1 FROM users WITH (UPDLOCK, HOLDLOCK) WHERE id = 42)
    UPDATE users SET email = 'alice@example.com' WHERE id = 42;
  ELSE
    INSERT INTO users (id, email) VALUES (42, 'alice@example.com');
COMMIT;

Verbose, but predictable.

Upsert in Elasticsearch and OpenSearch

Elasticsearch and OpenSearch support upsert through the Update API. Send a partial doc plus an upsert field. The engine updates if the document exists, or indexes the upsert content if it doesn't:

POST /users/_update/42
{
  "doc": {
    "last_seen": "2026-05-08T10:30:00Z"
  },
  "upsert": {
    "id": 42,
    "email": "alice@example.com",
    "created_at": "2026-05-08T10:30:00Z",
    "last_seen": "2026-05-08T10:30:00Z"
  }
}

Two convenience options:

  • "doc_as_upsert": true reuses the doc payload as the upsert content, so you don't have to specify it twice.
  • A scripted upsert lets you compute the new state from the existing document, which is useful for counters and sets:
POST /users/_update/42
{
  "scripted_upsert": true,
  "upsert": { "login_count": 0 },
  "script": {
    "source": "ctx._source.login_count += 1",
    "lang": "painless"
  }
}

These upserts are atomic at the document level, but use optimistic concurrency control via _seq_no and _primary_term. Two concurrent updates to the same document will retry with exponential backoff (the retry_on_conflict parameter). Use bulk upserts for throughput, and tune retry_on_conflict for hot keys.

Upsert in ClickHouse: A Different Model

ClickHouse is a column store optimized for batch analytics rather than transactional writes. There is no single UPSERT statement, and traditional row-level updates aren't free: they trigger background mutations that rewrite affected parts.

The idiomatic approach is the ReplacingMergeTree engine, which keeps multiple versions of a row keyed by the ORDER BY columns and merges them in the background, keeping only the latest version:

CREATE TABLE users (
    id UInt64,
    email String,
    last_seen DateTime,
    version UInt64
)
ENGINE = ReplacingMergeTree(version)
ORDER BY id;

Inserts stay cheap. Behind the scenes, ClickHouse merges parts and discards superseded rows. Until merge happens, both versions exist, so reads need FINAL or an argMax/groupArray workaround for strictly correct results:

SELECT id, argMax(email, version) AS email
FROM users
GROUP BY id;

For more on this pattern, see What is ClickHouse ReplacingMergeTree.

For genuine row-level updates, ClickHouse has supported ALTER TABLE ... UPDATE (an asynchronous mutation) for years. More recently the team has been working on Lightweight UPDATE, which doesn't rewrite entire data parts. Lightweight UPDATE is experimental in ClickHouse 25.7 (gated behind allow_experimental_lightweight_update = 1) and is expected to enter beta in 25.8. Neither replaces the analytical-write pattern that ReplacingMergeTree is built for, but they're useful when occasional row-level mutation is unavoidable.

Common Upsert Use Cases

  • Idempotent event processing. Replaying a Kafka topic without creating duplicates. ON CONFLICT DO NOTHING is the workhorse here.
  • Slowly changing dimensions. Keeping a customer profile up to date as new info arrives.
  • CDC sinks. Change-data-capture pipelines that materialize source rows into a destination, where each source row may insert, update, or delete.
  • Counters and aggregates. Incrementing a tally without a separate read-modify-write.
  • Caching with Elasticsearch. Partial updates to enriched documents as new data flows in.

Best Practices for Upserts

  1. Pick the concurrency-safe variant. In Postgres, ON CONFLICT over MERGE for hot writes. In SQL Server, transactional IF EXISTS over naive MERGE. In MySQL, watch out for unintended updates from secondary unique keys.
  2. Batch where you can. Upserts are slower per-row than pure inserts because of the conflict check. Bulk APIs (COPY ... ON CONFLICT, MySQL multi-row INSERT, Elasticsearch _bulk) close most of the gap.
  3. Index the conflict columns. The engine has to look up by the conflict key. A missing index turns every upsert into a sequential scan.
  4. Plan for retries. Document-store upserts (Elasticsearch, MongoDB) use optimistic concurrency. Set retry counts that match your write contention.
  5. Monitor write latency and conflict rates. A spike in conflicts often signals a hot key (a single ID being upserted by many writers). The fix is usually sharding or queueing those writes upstream.

Common Upsert Pitfalls

  • Race conditions in homemade upserts. "Check-then-insert" code without a transaction always has a race window. Use the database's native upsert.
  • Unintended updates from secondary unique keys in MySQL. A row that conflicts on email when you meant to conflict on id will update the wrong row.
  • Auto-increment gap waste in MySQL. Heavy upsert workloads exhaust the ID space faster than expected.
  • Counter increments via SELECT-then-UPDATE. Always wrong under concurrency. Use scripted upserts or atomic UPDATE counter = counter + 1.
  • Treating ClickHouse like an OLTP store. ReplacingMergeTree is the right answer, but only after you internalize that "fresh state" requires FINAL or a GROUP BY query.

Frequently Asked Questions

Q: What does upsert mean in plain English?
A: Upsert means "insert if new, update if it already exists." It's a single command that combines INSERT and UPDATE so you don't have to check whether a row exists before writing.

Q: What's the difference between upsert and merge?
A: They overlap. MERGE is the SQL standard's general-purpose statement that includes upsert as one of its capabilities (it can also delete on match). Database-specific syntax like PostgreSQL's ON CONFLICT is narrower (it only does upsert) but is often safer under concurrency.

Q: What's the difference between upsert and replace?
A: Upsert preserves columns you don't mention in the update, blending old and new values. REPLACE (in MySQL) deletes the existing row and inserts a fresh one, losing any data you didn't include. Use upsert; avoid REPLACE unless you genuinely want a wipe-and-replace.

Q: Is upsert atomic?
A: PostgreSQL's ON CONFLICT, MySQL's ON DUPLICATE KEY UPDATE, and Elasticsearch's _update are all atomic at the row or document level. SQL Server MERGE is not always atomic in practice; it has documented race conditions you must defend against with HOLDLOCK.

Q: Does upsert work with multiple unique keys?
A: PostgreSQL's ON CONFLICT lets you specify which constraint to target: ON CONFLICT (email) or ON CONFLICT ON CONSTRAINT users_email_key. MySQL's ON DUPLICATE KEY UPDATE triggers on any unique-key conflict, which is sometimes what you want and sometimes a footgun.

Q: How do I do a bulk upsert?
A: PostgreSQL: INSERT INTO ... VALUES (...), (...), ... ON CONFLICT .... MySQL: multi-row INSERT ... ON DUPLICATE KEY UPDATE. Elasticsearch: _bulk API with one update action per document. All of these scale better than single-row upserts in a loop.

Q: Does upsert work in Elasticsearch and OpenSearch?
A: Yes. Both use the _update API with an upsert field, plus optional scripted_upsert and doc_as_upsert shortcuts. Behavior is identical between Elasticsearch and OpenSearch for this API.

Q: Why is my PostgreSQL upsert hitting deadlocks?
A: Usually because two transactions are upserting overlapping sets of keys in different orders. Sort the keys before upserting in batch operations, or use smaller batches. Long-running transactions that hold locks while doing other work are the other common cause.

Q: What about NoSQL databases like MongoDB and DynamoDB?
A: MongoDB: update({...}, {$set: {...}}, {upsert: true}). DynamoDB: UpdateItem without a ConditionExpression is effectively an upsert. Both are atomic per-document or per-item.

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.