Pulse 2025 Product Roundup: From Monitoring to AI-Native Control Plane

Read more

PostgreSQL CTEs: Common Table Expressions Explained

A Common Table Expression (CTE) in PostgreSQL is a named subquery defined with a WITH clause that lives for the duration of a single query. At its most basic, a CTE lets you name an intermediate result and reference it by that name in the query body. This gives you a way to structure complex SQL without resorting to deeply nested subqueries or temporary tables.

WITH active_users AS (
  SELECT id, email, created_at
  FROM users
  WHERE last_login > NOW() - INTERVAL '30 days'
)
SELECT u.email, COUNT(o.id) AS order_count
FROM active_users u
JOIN orders o ON o.user_id = u.id
GROUP BY u.email;

You can define multiple CTEs in a single WITH clause, separating them with commas, and later CTEs can reference earlier ones. This makes it possible to build up a query in logical stages rather than writing one monolithic expression.

The PostgreSQL 12 Optimization Fence Change

This is the behavioral shift most likely to bite you if you're running mixed-version environments or working with a codebase that predates 2019.

Before PostgreSQL 12, CTEs were always materialized - meaning PostgreSQL would execute the CTE, store the full result in memory (or on disk), and then let the outer query read from that stored result. This made the CTE an "optimization fence": the planner couldn't push predicates from the outer query into the CTE, and couldn't merge the CTE's execution plan with the surrounding query. The upside was predictability. The downside was that filters applied outside the CTE couldn't reduce the rows fetched inside it.

PostgreSQL 12 changed the default. Non-recursive CTEs that are referenced only once and have no side effects are now inlined into the parent query by default. The planner treats them as if they were subqueries, which means it can push down filters, choose better join orders, and avoid materializing intermediate results. This is generally a win, but it means query plans can change between PostgreSQL 11 and 12 for the same SQL.

You can control this explicitly with the MATERIALIZED and NOT MATERIALIZED keywords:

-- Force materialization (old behavior)
WITH expensive_calc AS MATERIALIZED (
  SELECT region, SUM(revenue) AS total
  FROM sales
  GROUP BY region
)
SELECT * FROM expensive_calc WHERE region = 'US';

-- Explicitly allow inlining (new default for eligible CTEs)
WITH filtered AS NOT MATERIALIZED (
  SELECT * FROM events WHERE type = 'purchase'
)
SELECT COUNT(*) FROM filtered;

CTEs referenced more than once are materialized by default in PostgreSQL 12+ — no explicit hint is needed. Use MATERIALIZED to force materialization of a singly-referenced CTE that would otherwise be inlined, for example when the CTE is expensive and you want to guarantee it runs exactly once.

Recursive CTEs for Hierarchical Data

WITH RECURSIVE is PostgreSQL's mechanism for iterative queries that reference their own output. The canonical use cases are org charts, bill-of-materials trees, and category hierarchies - anywhere you have parent-child relationships of arbitrary depth.

A recursive CTE has two parts joined by UNION ALL: a non-recursive base case that seeds the result, and a recursive term that joins back to the CTE name itself. PostgreSQL evaluates the base case once, then keeps evaluating the recursive term using the rows produced in the previous iteration until no new rows are produced.

WITH RECURSIVE org_chart AS (
  -- Base case: start from the CEO (no manager)
  SELECT id, name, manager_id, 1 AS depth
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive term: find direct reports of each row
  SELECT e.id, e.name, e.manager_id, oc.depth + 1
  FROM employees e
  JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT id, name, depth
FROM org_chart
ORDER BY depth, name;

A few production concerns worth knowing: always index the join column (manager_id in this example) or you'll do a sequential scan on every iteration. Prefer UNION ALL over UNION — the deduplication step UNION performs on each iteration is rarely needed and adds overhead. That said, UNION (with deduplication) is valid syntax and can help prevent infinite loops in graph data where a row may appear more than once across iterations.

If your data can contain cycles (a graph rather than a strict tree), PostgreSQL 14+ supports the CYCLE clause for automatic cycle detection:

WITH RECURSIVE search_graph(id, link, data, depth) AS (
  SELECT g.id, g.link, g.data, 1
  FROM graph g
  WHERE g.id = 1

  UNION ALL

  SELECT g.id, g.link, g.data, sg.depth + 1
  FROM graph g
  JOIN search_graph sg ON g.id = sg.link
) CYCLE id SET is_cycle USING path
SELECT * FROM search_graph WHERE NOT is_cycle;

The CYCLE id SET is_cycle USING path clause tells PostgreSQL to track the id column across iterations, set the is_cycle flag when a repeated value is detected, and store the traversal path in path. For pre-PostgreSQL 14 compatibility, use a visited-array guard or a depth limit (WHERE depth < 50) instead.

For data that doesn't change frequently, consider materializing the recursive result into a regular table or materialized view and refreshing it on a schedule. Running a full recursive traversal per user request on a table with hundreds of thousands of nodes gets expensive quickly.

Writable CTEs

PostgreSQL lets you use INSERT, UPDATE, DELETE, and MERGE inside a WITH clause, not just SELECT. The modified rows become available to subsequent parts of the query via a RETURNING clause. This makes it possible to chain multiple write operations in a single atomic statement. Note that MERGE in a WITH clause requires PostgreSQL 15+, and using RETURNING with MERGE requires PostgreSQL 17+.

A practical pattern is "move to archive" - delete rows from the primary table and insert them into an archive table in one shot:

WITH deleted_orders AS (
  DELETE FROM orders
  WHERE status = 'cancelled' AND created_at < NOW() - INTERVAL '1 year'
  RETURNING *
)
INSERT INTO orders_archive
SELECT * FROM deleted_orders;

Both operations execute as a single transaction. You don't need explicit BEGIN/COMMIT or intermediate state management. All data-modifying CTEs in a statement execute using the same snapshot of the database - they see the same pre-statement state, and their effects don't become visible to each other mid-execution. This means two CTEs in the same WITH clause cannot see each other's writes.

Another use case is conditional upsert with feedback:

WITH updated AS (
  UPDATE inventory
  SET quantity = quantity - 1
  WHERE product_id = 42 AND quantity > 0
  RETURNING product_id, quantity
)
SELECT
  CASE WHEN COUNT(*) > 0 THEN 'reserved' ELSE 'out_of_stock' END AS result
FROM updated;

This pattern avoids a separate read-then-write round trip and is safe under concurrent load because the UPDATE takes a row-level lock.

When CTEs Hurt Performance vs Subqueries

CTEs are not always the right tool. The readability gain can mask a performance problem, and the right choice often only becomes clear after looking at EXPLAIN ANALYZE output.

One risk in modern PostgreSQL (12+) is the single-reference inlining case. A CTE that is referenced exactly once and has no side effects is inlined by default — the planner treats it as a subquery and may re-plan it in context. Before PostgreSQL 12, materialization prevented this: the CTE always ran once and the result was cached. If you want that guaranteed-once behavior for an inlined CTE, add MATERIALIZED explicitly. CTEs referenced more than once are materialized automatically, so double execution is only a concern if you have explicitly opted in to NOT MATERIALIZED on such a CTE.

The other case where CTEs can hurt is filter pushdown. A subquery lets the planner push outer predicates inside, so an index on the filtered column can be used. A MATERIALIZED CTE blocks this. If you write:

WITH all_orders AS MATERIALIZED (
  SELECT * FROM orders
)
SELECT * FROM all_orders WHERE customer_id = 99;

PostgreSQL scans the entire orders table to build the CTE, then filters. A plain subquery - or an inlined CTE - would let the planner use an index on customer_id. For large tables, the difference between a sequential scan and an index seek is substantial.

CTEs do have a legitimate performance advantage when the named subquery is complex, expensive, and referenced multiple times. In that case, MATERIALIZED gives you computed-once semantics. A subquery used twice re-executes twice. Window functions and aggregate subqueries used in multiple places are common candidates for this pattern.

When in doubt, write both forms and compare EXPLAIN (ANALYZE, BUFFERS) output. The planner's choice - index usage, join strategy, estimated vs actual rows - tells you more than any general rule.

Pulse - Elasticsearch Operations Done Right

Pulse can solve your Elasticsearch issues

Subscribe to the Pulse Newsletter

Get early access to new Pulse features, insightful blogs & exclusive events , webinars, and workshops.

We use cookies to provide an optimized user experience and understand our traffic. To learn more, read our use of cookies; otherwise, please choose 'Accept Cookies' to continue using our website.