~/articles/sql-set-operators-union-intersect.md

type: SQL read_time: 9 min words: 1664

SQL Set Operators Demystified: UNION, INTERSECT and EXCEPT Explained for Data Analysts

// Discover how UNION, INTERSECT and EXCEPT work in SQL, their performance impact, cross‑database syntax, and practical tips for data analysts in 2025.

Introduction

Set operators are a powerful yet often misunderstood feature of SQL. They let you combine the results of two (or more) SELECT statements into a single result set, without using joins. For data analysts, mastering UNION, INTERSECT and EXCEPT (or MINUS in Oracle) can simplify reporting, data cleaning and ad‑hoc analysis.

This article breaks down each operator, highlights differences between major RDBMSs, and offers optimisation advice that reflects the latest releases (PostgreSQL 17, SQL Server 2022, Oracle 23c, MySQL 8.4). By the end you’ll know when to use each operator, how they affect query performance, and how to avoid common pitfalls.

What Are SQL Set Operators?

A set operator combines the output of two SELECT statements that have the same number of columns and compatible data types. The three core operators covered here are:

Operator	Meaning	Typical RDBMS name
`UNION`	Rows from both queries, duplicates removed	`UNION` (all vendors)
`INTERSECT`	Rows that appear in both queries	`INTERSECT` (SQL Server, PostgreSQL, Oracle, MySQL 8.4)
`EXCEPT` / `MINUS`	Rows from the first query that are not in the second	`EXCEPT` (SQL Server, PostgreSQL, MySQL 8.4) / `MINUS` (Oracle)

All three operators follow the same syntactic pattern:

SELECT …               -- Query A
<set_operator>
SELECT …               -- Query B
[ORDER BY …]           -- Optional, placed after the final SELECT

Column and Data‑type Rules

Column count must match – if Query A returns three columns, Query B must also return three.
Data‑type compatibility – the database will attempt implicit conversion, but it is safest to align the data types explicitly (e.g., cast INT to BIGINT if needed).
Column names – the final result set inherits column names from the first SELECT. Use column aliases in the first query to control naming.

UNION vs UNION ALL

UNION – Removing Duplicates

UNION performs a DISTINCT on the combined rows. Internally the engine typically:

Executes both queries.
Concatenates the result sets.
Sorts the combined rows (or uses a hash aggregate) to eliminate duplicates.

Because of the extra sorting/aggregation step, UNION can be 30‑70 % slower than UNION ALL on large data sets, especially when the result set is millions of rows. Benchmarks on PostgreSQL 17 show an average runtime increase of 0.45 seconds per 1 million rows when switching from UNION ALL to UNION.

UNION ALL – Keep Everything

UNION ALL simply appends the rows from the second query to the first, preserving duplicates. No sorting or aggregation occurs, making it the fastest way to stitch together results.

When to use UNION ALL:

You know the two queries cannot produce overlapping rows (e.g., non‑overlapping date ranges or partition keys).
You need a complete audit trail where duplicate records are meaningful.
You plan to de‑duplicate later using ROW_NUMBER() or a temporary table.

Practical Example (PostgreSQL)

-- Customers (active) + Employees (current) – keep duplicates for audit
SELECT 'Customer' AS source, id, name
FROM customers
WHERE status = 'active'

UNION ALL

SELECT 'Employee' AS source, emp_id AS id, emp_name AS name
FROM employees
WHERE emp_status = 'current';

If you later discover some names appear in both tables and you only want unique rows, wrap the whole query in a CTE and apply DISTINCT:

WITH combined AS (
    … previous UNION ALL query …
)
SELECT DISTINCT source, id, name
FROM combined;

INTERSECT – Finding Common Rows

INTERSECT returns only rows that exist in both query results. Like UNION, it removes duplicates automatically.

Use Cases for Data Analysts

Identify customers who are also suppliers.
Find overlapping time periods in two event logs.
Validate data migration by comparing source and target tables.

Performance Tips

Index both sides – if the columns involved are indexed, the optimiser can use a hash join style intersection, which is far faster than sorting.
Limit early – applying WHERE clauses before the INTERSECT reduces the amount of data the engine must compare.
In PostgreSQL 17, INTERSECT now benefits from parallel hash execution for large sets, offering up to 2× speed‑up on 8‑core machines.

Example (SQL Server)

SELECT customer_id
FROM dbo.customers
WHERE country = 'UK'

INTERSECT

SELECT customer_id
FROM dbo.purchases
WHERE purchase_date >= '2024-01-01';

The result is the list of UK customers who made a purchase in 2024.

EXCEPT (or MINUS) – Excluding Rows

EXCEPT (or MINUS in Oracle) returns rows from the first query that do not appear in the second query. It is essentially the set‑theoretic difference operation.

Typical Scenarios

Find records that need cleaning – e.g., orders without a matching invoice.
Identify “orphan” rows after a data migration.
Generate a “missing data” report for a data quality audit.

Example (MySQL 8.4)

SELECT order_id, customer_id
FROM orders
WHERE order_date >= '2024-01-01'

EXCEPT   -- MySQL 8.4 supports EXCEPT directly
SELECT order_id, customer_id
FROM invoices
WHERE invoice_date IS NOT NULL;

Only orders without an invoice will be returned.

Performance Considerations

Like UNION, EXCEPT removes duplicates, so it incurs a sorting/aggregation cost.
Ensure the second query is as selective as possible – filtering early reduces the work needed to compare rows.
In Oracle 23c, MINUS now supports parallel execution for large data sets, dramatically reducing runtime on multi‑CPU systems.

Cross‑Database Syntax Differences

Feature	PostgreSQL	SQL Server	Oracle	MySQL
`UNION ALL`	✅	✅	✅	✅
`INTERSECT`	✅ (since 9.5)	✅	✅	✅ (8.4)
`EXCEPT`	✅	✅	❌ (uses `MINUS`)	✅ (8.4)
`MINUS`	❌ (use `EXCEPT`)	❌ (use `EXCEPT`)	✅	❌
Bracketed precedence	✅	✅	✅	✅
`ORDER BY` placement	Only after final SELECT	Same	Same	Same

Tip: When writing portable code, favour UNION ALL and EXCEPT (or MINUS with a conditional compile) and avoid INTERSECT if you need to support older MySQL versions (< 8.0).

Performance Optimisation Tips for Data Analysts

Prefer UNION ALL when possible – only switch to UNION if you truly need duplicate removal.
Push predicates down – filter rows in each sub‑query before the set operator.
Use CTEs for readability – but be aware that in PostgreSQL 17 and SQL Server 2022, CTEs are inlined, so they do not add overhead.
Leverage indexes – especially for INTERSECT and EXCEPT. An index on the columns used for comparison speeds up the hash‑based implementation.
Parallelise – modern engines (PostgreSQL 17, Oracle 23c, SQL Server 2022) automatically parallelise set operations when the estimated row count exceeds a threshold (default ~100 k rows). Ensure max_parallel_workers_per_gather (Postgres) or MAXDOP (SQL Server) is configured for your hardware.
Avoid unnecessary ORDER BY – ordering after a set operation forces an extra sort. If you only need ordered output for presentation, apply ORDER BY in the client layer or a final SELECT that wraps the set operation.
Materialise large intermediate results – for very large unions, consider inserting the first query into a temporary table, then appending the second with INSERT … SELECT. This can reduce memory pressure and allow you to create indexes on the temp table before the final SELECT.

Practical Use Cases for Data Analysts

Business Question	Set Operator	Example Query
Which products sold in Q1 were never sold in Q2?	`EXCEPT`	`SELECT product_id FROM sales_q1 EXCEPT SELECT product_id FROM sales_q2;`
List all unique customers across three regions.	`UNION`	`SELECT customer_id FROM region_a UNION SELECT customer_id FROM region_b UNION SELECT customer_id FROM region_c;`
Find customers who appear in both the loyalty program and the newsletter list.	`INTERSECT`	`SELECT email FROM loyalty_program INTERSECT SELECT email FROM newsletter_subscribers;`
Create a master list of transactions, preserving duplicates for audit.	`UNION ALL`	`SELECT * FROM online_transactions UNION ALL SELECT * FROM store_transactions;`

These patterns save time compared with writing complex joins or multiple sub‑queries.

Common Pitfalls and How to Avoid Them

Pitfall	Symptom	Fix
Mismatched column types	“ERROR: column 1 is of type integer but expression is of type text”	Cast explicitly: `SELECT CAST(col AS TEXT) …`
Forgetting `ORDER BY` placement	“ORDER BY clause not allowed here”	Place `ORDER BY` after the final SELECT, not inside each sub‑query.
Unintended duplicates with `UNION ALL`	Duplicate rows appear in report	Use `DISTINCT` or switch to `UNION` if duplicates must be removed.
Performance hit on large `EXCEPT`	Query runs for minutes on 10 M rows	Add selective `WHERE` clauses, ensure indexes, or break into temp table steps.
Using `MINUS` in SQL Server	“Incorrect syntax near ‘MINUS’”	Replace with `EXCEPT` or use a compatibility shim.

Conclusion

Set operators—UNION, INTERSECT and EXCEPT (or MINUS)—are indispensable tools for data analysts who need to merge, compare, or exclude result sets without resorting to complex joins. Understanding how each operator works, the performance implications of duplicate removal, and the subtle syntax differences across PostgreSQL, SQL Server, Oracle and MySQL will enable you to write cleaner, faster queries.

Remember these key take‑aways:

Prefer UNION ALL for speed; only use UNION when duplicates truly matter.
Push filters down and index the comparison columns for INTERSECT and EXCEPT.
Leverage modern engine features such as parallel hash execution (PostgreSQL 17, Oracle 23c) to handle large data sets efficiently.
Test on real data – performance can vary dramatically between databases and data volumes.

Armed with this knowledge, you can now demystify set operations and apply them confidently in your analytical workflows. Happy querying!