PostgreSQL UPDATE Statement: Syntax, Patterns, and Performance

The UPDATE statement in PostgreSQL is deceptively simple at the surface. The basic form - UPDATE table SET col = val WHERE condition - behaves as expected. But the full feature set diverges significantly from standard SQL, and several of those extensions are the ones you reach for most often in production: joining other tables in the update, getting back the modified rows without a follow-up query, and safely handling insert-or-update patterns. Understanding the underlying mechanics also prevents a category of performance surprises that are easy to create and hard to diagnose.

Basic Syntax

The standard form updates rows matching a WHERE clause:

UPDATE orders
SET status = 'shipped', updated_at = NOW()
WHERE id = 42;

Multiple columns are set in a single SET clause, separated by commas. You can reference the current value of a column on the right-hand side, which makes increment operations straightforward:

UPDATE accounts
SET balance = balance - 100
WHERE user_id = 7;

One point to know: PostgreSQL evaluates all right-hand side expressions using the pre-update values of the row. So SET a = a + 1, b = a will set b to the old value of a, not the incremented one. This is consistent behavior but bites people who assume left-to-right evaluation like in procedural code.

Omitting the WHERE clause updates every row in the table. PostgreSQL does not warn you. In a transaction, you can roll back; outside one, you cannot. Always verify filters before running updates against production tables.

UPDATE with FROM

PostgreSQL extends standard UPDATE with a FROM clause that lets you join other tables to drive the update logic:

UPDATE orders o
SET status = c.preferred_status
FROM campaigns c
WHERE o.campaign_id = c.id
  AND c.active = true;

The join condition goes in WHERE, not ON. The syntax differs across databases — SQL Server has a similar FROM clause; MySQL uses UPDATE ... JOIN ... SET syntax.

The critical pitfall here: if a target row joins to more than one row in the FROM table, only one of the matching join rows will be used to update the target row, but which one is not readily predictable. The update happens exactly once per target row, but the source row chosen is non-deterministic. If your join is one-to-many and you expect a specific value from the source, the result is undefined behavior. The fix is to collapse the source to one row per join key before referencing it:

UPDATE orders o
SET total = s.total_sum
FROM (
    SELECT order_id, SUM(price) AS total_sum
    FROM order_items
    GROUP BY order_id
) s
WHERE o.id = s.order_id;

This pattern - subquery or CTE in the FROM clause - is also useful when the source data needs filtering or aggregation before the update runs.

RETURNING: Avoiding the Follow-up Query

Most databases require a separate SELECT after an UPDATE to retrieve affected rows. PostgreSQL does not. The RETURNING clause returns the modified rows as a result set, using post-update values by default:

UPDATE orders
SET status = 'cancelled'
WHERE user_id = 99
RETURNING id, status, updated_at;

This is genuinely useful in application code - a single round-trip both performs the mutation and retrieves the rows, which matters when latency is constrained or when you need to pass the affected IDs downstream. RETURNING * returns all columns; you can also return expressions and computed values.

Combined with WITH (CTEs), RETURNING enables chained operations:

WITH updated AS (
    UPDATE accounts
    SET balance = balance - 100
    WHERE id = 7
    RETURNING id, balance
)
INSERT INTO audit_log (account_id, new_balance, changed_at)
SELECT id, balance, NOW() FROM updated;

The entire statement is atomic. Either both the update and the insert succeed, or neither does.

Upserts with INSERT ... ON CONFLICT DO UPDATE

For insert-or-update patterns, PostgreSQL provides ON CONFLICT DO UPDATE (available since PostgreSQL 9.5). It avoids the race condition inherent in a manual check-then-insert approach:

INSERT INTO product_inventory (sku, quantity, last_updated)
VALUES ('ABC-123', 50, NOW())
ON CONFLICT (sku)
DO UPDATE SET
    quantity = EXCLUDED.quantity,
    last_updated = EXCLUDED.last_updated;

EXCLUDED is a virtual table containing the values that would have been inserted. You can mix EXCLUDED values with current table values for conditional merges:

ON CONFLICT (sku)
DO UPDATE SET
    quantity = product_inventory.quantity + EXCLUDED.quantity,
    last_updated = NOW();

The conflict target - ON CONFLICT (sku) - must reference a unique index or unique constraint. If you want to silently skip conflicting rows instead of updating, use DO NOTHING. PostgreSQL 15 also added MERGE as a standards-compliant alternative, though ON CONFLICT remains more widely used and is simpler for most upsert cases.

Bulk Updates and Storage-Level Behavior

PostgreSQL's MVCC model means every UPDATE writes a new version of the row. The old version is kept until VACUUM reclaims it. This has storage implications: a high-velocity update workload on a small table can generate significant dead tuple bloat if autovacuum cannot keep up.

PostgreSQL has an optimization called HOT (Heap-Only Tuple) updates. A HOT update occurs when: (1) the updated row fits on the same heap page as the original, and (2) the updated columns are not referenced by any index. When both conditions hold, no new index entries are written - the update is cheaper in both I/O and CPU terms. You can create conditions for HOT updates by setting a lower fillfactor on the table, which reserves free space on each page for in-place updates:

ALTER TABLE orders SET (fillfactor = 70);

The trade-off: a lower fillfactor increases the table's physical footprint, slowing sequential scans. This makes it most appropriate for tables with frequent updates on non-indexed columns where random access dominates.

For updating millions of rows - schema migrations, backfills, data corrections - running a single UPDATE on the entire table holds row-level locks for the duration and generates a massive volume of dead tuples at once. The standard production approach is batching:

DO $$
DECLARE
    batch_size INT := 10000;
    rows_updated INT;
BEGIN
    LOOP
        UPDATE orders
        SET status = 'legacy'
        WHERE id IN (
            SELECT id FROM orders
            WHERE status = 'old'
            LIMIT batch_size
        );
        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;
        PERFORM pg_sleep(0.1);
    END LOOP;
END $$;

Each iteration commits independently, keeping lock duration short and giving autovacuum time to reclaim dead tuples between batches. Avoid OFFSET-based pagination for batching - once you are millions of rows in, the offset scan itself becomes the bottleneck. Filtering by a range on an indexed column (like id) or using a cursor is more efficient.

One last point: avoid using UPDATE to "touch" rows for replication or event-sourcing purposes without a real change. Each update writes a new tuple, generates WAL, and creates dead tuples regardless of whether any column value actually changed. If the application needs to detect that a row was processed, a separate status column or an outbox table is a better model than spurious updates.