~/articles/sql-recursive-cte-walkthrough.md

type: SQL read_time: 10 min words: 1994

A Step‑by‑Step Walkthrough of Recursive CTEs in SQL

// Discover how recursive common table expressions (CTEs) work, with clear syntax, real‑world examples and performance tips for SQL Server, PostgreSQL, MySQL and Oracle.

Introduction

Recursive common table expressions (CTEs) are a powerful, yet often under‑used, feature of modern relational databases. They let you write a single, readable query that can traverse hierarchical data—such as organisational charts, bill‑of‑materials, folder structures or graph relationships—without resorting to procedural code or temporary tables.

This article walks you through the theory, the exact syntax for the major SQL dialects, and a series of practical examples that data analysts at DataAnalyst.co.uk can copy‑paste into their own environments. By the end you’ll understand:

How a recursive CTE is built from an anchor and a recursive member
The importance of a termination condition to avoid infinite loops
Performance considerations and common pitfalls
Real‑world use‑cases across SQL Server, PostgreSQL, MySQL 8+ and Oracle

Let’s dive in.

1. What Is a Recursive CTE?

A common table expression (CTE) is a temporary result set that you can reference within a SELECT, INSERT, UPDATE or DELETE statement. When the CTE refers to itself, it becomes recursive.

The classic recursive pattern consists of three parts:

Anchor member – the base query that returns the first level of data (e.g., the root node of a tree).
Recursive member – a query that joins the CTE to the underlying table, producing the next level of rows.
Termination condition – a WHERE clause (or a MAXRECURSION hint) that stops the recursion once all levels have been processed.

The database engine repeatedly executes the recursive member, feeding the result of the previous iteration back into the CTE until the termination condition evaluates to false.

Visualising the Process

Iteration 0 (Anchor)   →  R0
Iteration 1 (Recursive) →  R1   (joined to R0)
Iteration 2 (Recursive) →  R2   (joined to R1)
…
Final result = UNION ALL of R0, R1, R2, …

Because the union is UNION ALL, duplicate rows are not eliminated, which keeps the recursion fast. If you need distinct rows you can wrap the whole CTE in a SELECT DISTINCT later.

2. Syntax Across the Major SQL Dialects

Below is the canonical syntax, followed by the slight variations you’ll encounter in the most popular RDBMSs.

2.1 Standard (SQL‑99) Form

WITH RECURSIVE cte_name (col1, col2, …) AS (
    -- Anchor member
    SELECT … FROM … WHERE …         

    UNION ALL

    -- Recursive member
    SELECT … FROM … 
    JOIN cte_name ON …               -- reference the CTE
    WHERE …                          -- termination condition
)
SELECT * FROM cte_name;

The keyword RECURSIVE is required by the SQL standard, but some systems (SQL Server) assume recursion automatically.

2.2 SQL Server (2012+)

WITH cte_name (col1, col2, …) AS (
    SELECT … FROM … WHERE …          -- anchor
    UNION ALL
    SELECT … FROM … 
    JOIN cte_name ON …               -- recursive part
    WHERE …
)
SELECT * FROM cte_name
OPTION (MAXRECURSION 1000);          -- optional safety limit

SQL Server does not require the RECURSIVE keyword.

2.3 PostgreSQL

WITH RECURSIVE cte_name (col1, col2, …) AS (
    SELECT … FROM … WHERE …
    UNION ALL
    SELECT … FROM … 
    JOIN cte_name USING (id)          -- typical join style
    WHERE …
)
SELECT * FROM cte_name;

PostgreSQL also supports SEARCH and CYCLE clauses to control ordering and detect cycles (see section 4).

2.4 MySQL 8.0+

WITH RECURSIVE cte_name (col1, col2, …) AS (
    SELECT … FROM … WHERE …
    UNION ALL
    SELECT … FROM … 
    JOIN cte_name ON … 
    WHERE …
)
SELECT * FROM cte_name;

MySQL follows the standard syntax; the MAX_RECURSION_DEPTH system variable (default 1000) caps recursion.

2.5 Oracle 11g R2+ (using `CONNECT BY` alternative)

Oracle introduced true recursive CTEs in 11g R2:

WITH cte_name (col1, col2, …) AS (
    SELECT … FROM … WHERE …
    UNION ALL
    SELECT … FROM … 
    JOIN cte_name ON … 
    WHERE …
)
SELECT * FROM cte_name;

Older Oracle versions still rely on the hierarchical query syntax START WITH … CONNECT BY ….

3. Building Your First Recursive CTE

Let’s start with a simple, database‑agnostic example that lists the days of the week.

3.1 Example – Weekday Generator

WITH RECURSIVE weekdays (n, name) AS (
    -- Anchor: Monday (SQL Server uses 0 = Sunday, adjust as needed)
    SELECT 1, DATENAME(WEEKDAY, DATEFROMPARTS(2025,1,6))   -- Monday
    UNION ALL
    -- Recursive: add one day each iteration
    SELECT n + 1,
           DATENAME(WEEKDAY, DATEADD(DAY, n, DATEFROMPARTS(2025,1,6)))
    FROM weekdays
    WHERE n < 7
)
SELECT name
FROM weekdays;

Result

name
Monday
Tuesday
…
Sunday

Key points:

The anchor returns the first row (Monday).
The recursive member adds one day (n + 1) and stops when n = 7.
DATENAME and DATEADD are SQL‑Server functions; in PostgreSQL you would use to_char and interval '1 day'.

3.2 Example – Employee Hierarchy (Org Chart)

Assume a table employees:

employee_id	name	manager_id
1	Alice	NULL
2	Bob	1
3	Carol	1
4	Dave	2
5	Eve	2

We want a list of every employee together with their level in the hierarchy (0 = top manager).

WITH RECURSIVE org_chart AS (
    -- Anchor: top‑level managers (manager_id IS NULL)
    SELECT employee_id,
           name,
           manager_id,
           0 AS lvl
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive: join children to their parent row
    SELECT e.employee_id,
           e.name,
           e.manager_id,
           oc.lvl + 1
    FROM employees e
    JOIN org_chart oc ON e.manager_id = oc.employee_id
)
SELECT employee_id, name, manager_id, lvl
FROM org_chart
ORDER BY lvl, manager_id;

Result

employee_id	name	manager_id	lvl
1	Alice	NULL	0
2	Bob	1	1
3	Carol	1	1
4	Dave	2	2
5	Eve	2	2

The query works unchanged on PostgreSQL, MySQL 8+, and SQL Server (replace RECURSIVE with nothing for SQL Server).

4. Advanced Features and Gotchas

4.1 Detecting Cycles

A cycle occurs when a row eventually references itself, causing infinite recursion. PostgreSQL offers a built‑in CYCLE clause:

WITH RECURSIVE org AS (
    SELECT employee_id, manager_id, 0 AS depth, FALSE AS is_cycle
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT e.employee_id,
           e.manager_id,
           o.depth + 1,
           e.employee_id = ANY(o.path)   -- detect loop
    FROM employees e
    JOIN org o ON e.manager_id = o.employee_id
    WHERE NOT o.is_cycle
)
SELECT *
FROM org
WHERE NOT is_cycle;

SQL Server users can simulate cycle detection with a PATH column (e.g., concatenating IDs) and a WHERE clause that checks CHARINDEX.

4.2 Controlling Order with `SEARCH`

PostgreSQL’s SEARCH clause lets you specify a depth‑first or breadth‑first order without a final ORDER BY:

WITH RECURSIVE org AS (
    …
)
SEARCH BREADTH FIRST BY employee_id SET order_col;

4.3 Limiting Recursion Depth

SQL Server: OPTION (MAXRECURSION 500) – default 100, max 32767.
MySQL: SET @@cte_max_recursion_depth = 500; – default 1000.
PostgreSQL: SET max_recursive_iterations = 500; – default 1000.

Always set a sensible limit in production to protect against runaway queries.

4.4 Performance Tips

Tip	Why it Helps
Index the join columns (e.g., `manager_id`)	Each recursion step performs a join; an index reduces I/O.
*Avoid `SELECT `** in the CTE	Pulling unnecessary columns inflates the temporary result set.
Materialise large CTEs with `CREATE TEMP TABLE` when you need to reuse the result multiple times.
Use `UNION ALL` instead of `UNION`	`UNION` forces a distinct sort, which is costly and unnecessary for most hierarchies.
Profile with `EXPLAIN ANALYZE` (PostgreSQL) or `SET STATISTICS IO ON` (SQL Server) to see the recursion cost.

According to the 2024 Stack Overflow Developer Survey, 23 % of professional data analysts use recursive CTEs at least once a month, yet only 9 % feel confident about performance tuning. Mastering the tips above can narrow that gap.

5. Real‑World Use Cases for Data Analysts

5.1 Multi‑Level Bill of Materials (BOM)

Manufacturing datasets often store components in a self‑referencing table:

part_id	parent_part_id	qty

A recursive CTE can explode the hierarchy to calculate total material requirements for a given top‑level product.

WITH RECURSIVE bom AS (
    SELECT part_id,
           parent_part_id,
           qty,
           1 AS level,
           qty AS total_qty
    FROM parts
    WHERE parent_part_id IS NULL          -- top‑level product

    UNION ALL

    SELECT p.part_id,
           p.parent_part_id,
           p.qty,
           b.level + 1,
           b.total_qty * p.qty
    FROM parts p
    JOIN bom b ON p.parent_part_id = b.part_id
)
SELECT part_id, SUM(total_qty) AS required_qty
FROM bom
GROUP BY part_id;

5.2 Network Graph Traversal

For a table edges(source, target), you can find all reachable nodes from a start point:

WITH RECURSIVE reach AS (
    SELECT source, target FROM edges WHERE source = 'A'   -- anchor
    UNION
    SELECT r.source, e.target
    FROM reach r
    JOIN edges e ON e.source = r.target
)
SELECT DISTINCT target FROM reach;

This is handy for churn analysis (e.g., “which accounts are downstream of a churned customer?”).

5.3 Time‑Series Gaps

When you need a continuous calendar for reporting, a recursive CTE can generate missing dates:

WITH RECURSIVE dates AS (
    SELECT CAST('2025-01-01' AS DATE) AS d
    UNION ALL
    SELECT d + INTERVAL '1 DAY'
    FROM dates
    WHERE d < '2025-12-31'
)
SELECT d
FROM dates
LEFT JOIN sales s ON s.sale_date = d
WHERE s.sale_date IS NULL;   -- dates with no sales

6. Common Pitfalls and How to Avoid Them

Pitfall	Symptom	Fix
Missing termination condition	Query runs until it hits the engine’s recursion limit, then errors out.	Always include a `WHERE` clause that eventually becomes false, or set `MAXRECURSION`.
Circular references	Infinite loop or “maximum recursion depth exceeded”.	Detect cycles with a path column, or use PostgreSQL’s `CYCLE` clause.
Excessive row duplication	Result set larger than expected.	Verify you’re using `UNION ALL` (not `UNION`) and that the join condition uniquely identifies child rows.
Performance collapse on large hierarchies	Query takes minutes on tables with > 1 million rows.	Add covering indexes, consider materialising the hierarchy in a separate table, or use a graph‑database for very deep graphs.
Incorrect ordering	Levels appear out of sequence.	Use `ORDER BY level, …` after the CTE, or PostgreSQL’s `SEARCH` clause.

7. When to Use (and When Not to Use) Recursive CTEs

Situation	Recommended
Static hierarchical data (org charts, BOM)	✅ Recursive CTE – simple, maintainable.
Very deep (> 100 levels) or massive (> 10 M rows) hierarchies	❌ Consider a materialised path column, adjacency list with indexes, or a dedicated graph database (e.g., Neo4j).
Need for frequent updates (e.g., real‑time parent‑child changes)	✅ Use a CTE together with triggers to keep a denormalised path column up‑to‑date.
One‑off ad‑hoc reporting	✅ Quick CTE scripts are ideal.
Complex path‑finding (shortest path, all‑pairs)	❌ Use specialised algorithms or extensions (PostgreSQL’s `pgRouting`).

Conclusion

Recursive CTEs turn what would otherwise be multi‑step procedural logic into a single, declarative SQL statement. By mastering the anchor‑recursive pattern, termination conditions, and performance best practices, data analysts at DataAnalyst.co.uk can unlock hierarchical insights from any relational dataset—whether it’s an employee directory, a multi‑level bill of materials, or a network of customer referrals.

Remember to:

Start simple – test with a small data slice.
Guard against cycles – always have a termination clause.
Profile and index – the join column is the key to scalability.

With these tools in hand, you’ll be able to write clean, efficient queries that scale from a handful of rows to millions, all while keeping your SQL code readable and maintainable. Happy querying!