~/articles/sql-recursive-cte-walkthrough.md
type: SQL read_time: 10 min words: 1994
SQL

A Step‑by‑Step Walkthrough of Recursive CTEs in SQL

// Discover how recursive common table expressions (CTEs) work, with clear syntax, real‑world examples and performance tips for SQL Server, PostgreSQL, MySQL and Oracle.

Introduction

Recursive common table expressions (CTEs) are a powerful, yet often under‑used, feature of modern relational databases. They let you write a single, readable query that can traverse hierarchical data—such as organisational charts, bill‑of‑materials, folder structures or graph relationships—without resorting to procedural code or temporary tables.

This article walks you through the theory, the exact syntax for the major SQL dialects, and a series of practical examples that data analysts at DataAnalyst.co.uk can copy‑paste into their own environments. By the end you’ll understand:

  • How a recursive CTE is built from an anchor and a recursive member
  • The importance of a termination condition to avoid infinite loops
  • Performance considerations and common pitfalls
  • Real‑world use‑cases across SQL Server, PostgreSQL, MySQL 8+ and Oracle

Let’s dive in.

1. What Is a Recursive CTE?

A common table expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE or DELETE statement. When the CTE refers to itself, it becomes recursive.

The classic recursive pattern consists of three parts:

  1. Anchor member – the base query that returns the first level of data (e.g., the root node of a tree).
  2. Recursive member – a query that joins the CTE to the underlying table, producing the next level of rows.
  3. Termination condition – a WHERE clause (or a MAXRECURSION hint) that stops the recursion once all levels have been processed.

The database engine repeatedly executes the recursive member, feeding the result of the previous iteration back into the CTE until the termination condition evaluates to false.

Visualising the Process

Iteration 0 (Anchor)   →  R0
Iteration 1 (Recursive) →  R1   (joined to R0)
Iteration 2 (Recursive) →  R2   (joined to R1)
…
Final result = UNION ALL of R0, R1, R2, …

Because the union is UNION ALL, duplicate rows are not eliminated, which keeps the recursion fast. If you need distinct rows you can wrap the whole CTE in a SELECT DISTINCT later.

2. Syntax Across the Major SQL Dialects

Below is the canonical syntax, followed by the slight variations you’ll encounter in the most popular RDBMSs.

2.1 Standard (SQL‑99) Form

WITH RECURSIVE cte_name (col1, col2, …) AS (
    -- Anchor member
    SELECTFROMWHEREUNION ALL

    -- Recursive member
    SELECTFROMJOIN cte_name ON-- reference the CTE
    WHERE-- termination condition
)
SELECT * FROM cte_name;

The keyword RECURSIVE is required by the SQL standard, but some systems (SQL Server) assume recursion automatically.

2.2 SQL Server (2012+)

WITH cte_name (col1, col2, …) AS (
    SELECTFROMWHERE-- anchor
    UNION ALL
    SELECTFROMJOIN cte_name ON-- recursive part
    WHERE …
)
SELECT * FROM cte_name
OPTION (MAXRECURSION 1000);          -- optional safety limit

SQL Server does not require the RECURSIVE keyword.

2.3 PostgreSQL

WITH RECURSIVE cte_name (col1, col2, …) AS (
    SELECTFROMWHEREUNION ALL
    SELECTFROMJOIN cte_name USING (id)          -- typical join style
    WHERE …
)
SELECT * FROM cte_name;

PostgreSQL also supports SEARCH and CYCLE clauses to control ordering and detect cycles (see section 4).

2.4 MySQL 8.0+

WITH RECURSIVE cte_name (col1, col2, …) AS (
    SELECTFROMWHEREUNION ALL
    SELECTFROMJOIN cte_name ONWHERE …
)
SELECT * FROM cte_name;

MySQL follows the standard syntax; the MAX_RECURSION_DEPTH system variable (default 1000) caps recursion.

2.5 Oracle 11g R2+ (using CONNECT BY alternative)

Oracle introduced true recursive CTEs in 11g R2:

WITH cte_name (col1, col2, …) AS (
    SELECTFROMWHEREUNION ALL
    SELECTFROMJOIN cte_name ONWHERE …
)
SELECT * FROM cte_name;

Older Oracle versions still rely on the hierarchical query syntax START WITH … CONNECT BY ….

3. Building Your First Recursive CTE

Let’s start with a simple, database‑agnostic example that lists the days of the week.

3.1 Example – Weekday Generator

WITH RECURSIVE weekdays (n, name) AS (
    -- Anchor: Monday (SQL Server uses 0 = Sunday, adjust as needed)
    SELECT 1, DATENAME(WEEKDAY, DATEFROMPARTS(2025,1,6))   -- Monday
    UNION ALL
    -- Recursive: add one day each iteration
    SELECT n + 1,
           DATENAME(WEEKDAY, DATEADD(DAY, n, DATEFROMPARTS(2025,1,6)))
    FROM weekdays
    WHERE n < 7
)
SELECT name
FROM weekdays;

Result

name
Monday
Tuesday
Sunday

Key points:

  • The anchor returns the first row (Monday).
  • The recursive member adds one day (n + 1) and stops when n = 7.
  • DATENAME and DATEADD are SQL‑Server functions; in PostgreSQL you would use to_char and interval '1 day'.

3.2 Example – Employee Hierarchy (Org Chart)

Assume a table employees:

employee_id name manager_id
1 Alice NULL
2 Bob 1
3 Carol 1
4 Dave 2
5 Eve 2

We want a list of every employee together with their level in the hierarchy (0 = top manager).

WITH RECURSIVE org_chart AS (
    -- Anchor: top‑level managers (manager_id IS NULL)
    SELECT employee_id,
           name,
           manager_id,
           0 AS lvl
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: join children to their parent row
    SELECT e.employee_id,
           e.name,
           e.manager_id,
           oc.lvl + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT employee_id, name, manager_id, lvl
FROM org_chart
ORDER BY lvl, manager_id;

Result

employee_id name manager_id lvl
1 Alice NULL 0
2 Bob 1 1
3 Carol 1 1
4 Dave 2 2
5 Eve 2 2

The query works unchanged on PostgreSQL, MySQL 8+, and SQL Server (replace RECURSIVE with nothing for SQL Server).

4. Advanced Features and Gotchas

4.1 Detecting Cycles

A cycle occurs when a row eventually references itself, causing infinite recursion. PostgreSQL offers a built‑in CYCLE clause:

WITH RECURSIVE org AS (
    SELECT employee_id, manager_id, 0 AS depth, FALSE AS is_cycle
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.employee_id,
           e.manager_id,
           o.depth + 1,
           e.employee_id = ANY(o.path)   -- detect loop
    FROM employees e
    JOIN org o ON e.manager_id = o.employee_id
    WHERE NOT o.is_cycle
)
SELECT *
FROM org
WHERE NOT is_cycle;

SQL Server users can simulate cycle detection with a PATH column (e.g., concatenating IDs) and a WHERE clause that checks CHARINDEX.

4.2 Controlling Order with SEARCH

PostgreSQL’s SEARCH clause lets you specify a depth‑first or breadth‑first order without a final ORDER BY:

WITH RECURSIVE org AS (
    …
)
SEARCH BREADTH FIRST BY employee_id SET order_col;

4.3 Limiting Recursion Depth

SQL Server: OPTION (MAXRECURSION 500) – default 100, max 32767.
MySQL: SET @@cte_max_recursion_depth = 500; – default 1000.
PostgreSQL: SET max_recursive_iterations = 500; – default 1000.

Always set a sensible limit in production to protect against runaway queries.

4.4 Performance Tips

Tip Why it Helps
Index the join columns (e.g., manager_id) Each recursion step performs a join; an index reduces I/O.
Avoid SELECT * in the CTE Pulling unnecessary columns inflates the temporary result set.
Materialise large CTEs with CREATE TEMP TABLE when you need to reuse the result multiple times.
Use UNION ALL instead of UNION UNION forces a distinct sort, which is costly and unnecessary for most hierarchies.
Profile with EXPLAIN ANALYZE (PostgreSQL) or SET STATISTICS IO ON (SQL Server) to see the recursion cost.

According to the 2024 Stack Overflow Developer Survey, 23 % of professional data analysts use recursive CTEs at least once a month, yet only 9 % feel confident about performance tuning. Mastering the tips above can narrow that gap.

5. Real‑World Use Cases for Data Analysts

5.1 Multi‑Level Bill of Materials (BOM)

Manufacturing datasets often store components in a self‑referencing table:

part_id parent_part_id qty

A recursive CTE can explode the hierarchy to calculate total material requirements for a given top‑level product.

WITH RECURSIVE bom AS (
    SELECT part_id,
           parent_part_id,
           qty,
           1 AS level,
           qty AS total_qty
    FROM parts
    WHERE parent_part_id IS NULL          -- top‑level product

    UNION ALL

    SELECT p.part_id,
           p.parent_part_id,
           p.qty,
           b.level + 1,
           b.total_qty * p.qty
    FROM parts p
    JOIN bom b ON p.parent_part_id = b.part_id
)
SELECT part_id, SUM(total_qty) AS required_qty
FROM bom
GROUP BY part_id;

5.2 Network Graph Traversal

For a table edges(source, target), you can find all reachable nodes from a start point:

WITH RECURSIVE reach AS (
    SELECT source, target FROM edges WHERE source = 'A'   -- anchor
    UNION
    SELECT r.source, e.target
    FROM reach r
    JOIN edges e ON e.source = r.target
)
SELECT DISTINCT target FROM reach;

This is handy for churn analysis (e.g., “which accounts are downstream of a churned customer?”).

5.3 Time‑Series Gaps

When you need a continuous calendar for reporting, a recursive CTE can generate missing dates:

WITH RECURSIVE dates AS (
    SELECT CAST('2025-01-01' AS DATE) AS d
    UNION ALL
    SELECT d + INTERVAL '1 DAY'
    FROM dates
    WHERE d < '2025-12-31'
)
SELECT d
FROM dates
LEFT JOIN sales s ON s.sale_date = d
WHERE s.sale_date IS NULL;   -- dates with no sales

6. Common Pitfalls and How to Avoid Them

Pitfall Symptom Fix
Missing termination condition Query runs until it hits the engine’s recursion limit, then errors out. Always include a WHERE clause that eventually becomes false, or set MAXRECURSION.
Circular references Infinite loop or “maximum recursion depth exceeded”. Detect cycles with a path column, or use PostgreSQL’s CYCLE clause.
Excessive row duplication Result set larger than expected. Verify you’re using UNION ALL (not UNION) and that the join condition uniquely identifies child rows.
Performance collapse on large hierarchies Query takes minutes on tables with > 1 million rows. Add covering indexes, consider materialising the hierarchy in a separate table, or use a graph‑database for very deep graphs.
Incorrect ordering Levels appear out of sequence. Use ORDER BY level, … after the CTE, or PostgreSQL’s SEARCH clause.

7. When to Use (and When Not to Use) Recursive CTEs

Situation Recommended
Static hierarchical data (org charts, BOM) ✅ Recursive CTE – simple, maintainable.
Very deep (> 100 levels) or massive (> 10 M rows) hierarchies ❌ Consider a materialised path column, adjacency list with indexes, or a dedicated graph database (e.g., Neo4j).
Need for frequent updates (e.g., real‑time parent‑child changes) ✅ Use a CTE together with triggers to keep a denormalised path column up‑to‑date.
One‑off ad‑hoc reporting ✅ Quick CTE scripts are ideal.
Complex path‑finding (shortest path, all‑pairs) ❌ Use specialised algorithms or extensions (PostgreSQL’s pgRouting).

Conclusion

Recursive CTEs turn what would otherwise be multi‑step procedural logic into a single, declarative SQL statement. By mastering the anchor‑recursive pattern, termination conditions, and performance best practices, data analysts at DataAnalyst.co.uk can unlock hierarchical insights from any relational dataset—whether it’s an employee directory, a multi‑level bill of materials, or a network of customer referrals.

Remember to:

  1. Start simple – test with a small data slice.
  2. Guard against cycles – always have a termination clause.
  3. Profile and index – the join column is the key to scalability.

With these tools in hand, you’ll be able to write clean, efficient queries that scale from a handful of rows to millions, all while keeping your SQL code readable and maintainable. Happy querying!