Intermediate

Subqueries vs CTEs: When to Use Which

Both let you build queries in steps — but they read very differently. Learn correlated subqueries, CTEs, and the cases where each one wins.

A subquery is a query nested inside another. A CTE (Common Table Expression) is a named result set defined with WITH at the top of your statement. Both let you compute intermediate results and build a query in steps — the difference is mostly readability, reuse, and a few performance details that matter on large tables.

The short version: reach for a CTE when a query has multiple logical steps or you reuse the same intermediate result, and reach for a subquery for a quick one-off filter or a single computed value. The rest of this guide shows exactly when each one shines.

subquery vs CTE structure — Same result, different shape: subqueries nest inward and you read them inside-out, while CTEs lay the steps out in order so the query reads top to bottom.

Types of subqueries

Subqueries come in three flavors based on where they sit and what they return.

Scalar: returns a single value, usable in SELECT or WHERE
Row/column: returns a list, used with IN / ANY / ALL
Derived table: a subquery in the FROM clause, treated like a temporary table

-- Scalar subquery: compare each order to the global average
SELECT id, total
FROM orders
WHERE total > (SELECT AVG(total) FROM orders);

-- Derived table: aggregate, then filter
SELECT *
FROM (
  SELECT user_id, SUM(total) AS revenue
  FROM orders
  GROUP BY user_id
) t
WHERE t.revenue > 1000;

Correlated subqueries

A correlated subquery references the outer query, so it cannot be evaluated once up front — it logically re-runs for every outer row. That makes it powerful for per-row comparisons but a common performance trap on large tables.

-- Each user's orders above THEIR OWN average
SELECT o.id, o.user_id, o.total
FROM orders o
WHERE o.total > (
  SELECT AVG(o2.total)
  FROM orders o2
  WHERE o2.user_id = o.user_id   -- references outer row
);

The same query with a window function computes each user's average once and keeps every row visible, usually with a single pass over the data instead of one pass per row.

CTEs (WITH clause)

A CTE names a result set up front so the main query reads top-to-bottom, like a short paragraph instead of a nested maze. You can reference the same CTE multiple times and chain several together, each building on the last.

WITH revenue AS (
  SELECT user_id, SUM(total) AS amount
  FROM orders
  GROUP BY user_id
),
top_spenders AS (
  SELECT * FROM revenue WHERE amount > 1000
)
SELECT u.name, t.amount
FROM top_spenders t
JOIN users u ON u.id = t.user_id
ORDER BY t.amount DESC;

Recursive CTEs

Recursive CTEs walk hierarchies — org charts, category trees, threaded comments, bill-of-materials. They have two parts joined by UNION ALL: an anchor query that produces the starting rows, and a recursive query that references the CTE itself until no new rows are produced.

WITH RECURSIVE chain AS (
  -- anchor: top-level employees
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  -- recursive: everyone reporting to the previous level
  SELECT e.id, e.name, e.manager_id, c.depth + 1
  FROM employees e
  JOIN chain c ON e.manager_id = c.id
)
SELECT * FROM chain ORDER BY depth;

Materialized vs inlined CTEs

Older PostgreSQL always "materialized" CTEs — it computed each one into a temporary result before the main query ran. That was predictable but sometimes slower, because the planner could not push filters into the CTE. Since PostgreSQL 12, simple non-recursive CTEs are inlined by default, letting the planner optimize across the boundary.

-- Force the old behavior (compute once, reuse)
WITH recent AS MATERIALIZED (
  SELECT * FROM orders WHERE order_date > '2026-01-01'
)
SELECT * FROM recent WHERE total > 500;

-- Force inlining so filters push down into the CTE
WITH recent AS NOT MATERIALIZED (
  SELECT * FROM orders WHERE order_date > '2026-01-01'
)
SELECT * FROM recent WHERE total > 500;

Performance: which is faster?

On modern PostgreSQL, a CTE and the equivalent subquery usually produce the same plan, so neither is inherently faster. The real performance differences come from what you do inside them, not the syntax you chose.

A reused CTE marked MATERIALIZED avoids recomputing an expensive step
A correlated subquery is the pattern most likely to be slow — rewrite it
EXISTS usually beats IN when the subquery returns many rows
Always confirm with EXPLAIN ANALYZE rather than assuming

Which should you use?

Reused result or multi-step logic → CTE (clearer, named)
One-off filter or scalar comparison → subquery (inline, concise)
Hierarchical / tree data → recursive CTE (only option)
Tight inner loop where speed matters → benchmark both; rewrite correlated subqueries as JOINs or window functions

Common mistakes

Using a scalar subquery that can return more than one row
Leaving a correlated subquery in place when a JOIN would be far faster
Forgetting that a CTE is only visible to the statement that defines it
Writing a recursive CTE with no termination condition on cyclic data
Reaching for IN with a huge subquery result instead of EXISTS

Correlated subqueries are a common source of slow queries — see query optimization for faster rewrites, and the JOINs guide for turning a per-row lookup into a single set-based join.