# Extracting a domain or subdomain from a url in Bigquery

Often when working with web or traffic data you have a list of urls across many different websites. You may then want to map these into domains or subdomains, such as:

https://news.mydomain.co.uk/inflation-is-so-high.html -> mydomain.co.uk
https://mobile.mydomain.co.uk/inflation-is-so-high.html -> mydomain.co.uk
https://news.sky.com/inflation-9-percent.html -> sky.com

If you are anything like me, your first thought would be to search stackoverflow for some pre-made regex and hope it works... but no, fortunately bigquery has some built in NET functions that work really well!

# Extract subdomain from a url in Bigquery

You want to go from:

https://news.mydomain.co.uk/inflation-is-so-high.html -> news.mydomain.co.uk

To do this use NET.HOST(), for instance:

SELECT NET.HOST(https://news.mydomain.co.uk/inflation-is-so-high.html)

gives news.mydomain.co.uk

# Extract domain from a url in Bigquery

You want to go from:

https://news.mydomain.co.uk/inflation-is-so-high.html -> mydomain.co.uk

To do this use NET.REG_DOMAIN(), for instance:

SELECT NET.REG_DOMAIN(https://news.mydomain.co.uk/inflation-is-so-high.html)

gives mydomain.co.uk

# Further info

I use these NET functions all the time... it can be such a headache working with edge-cases on a regex you copy/pasted from a comment. You can read the full documentation from Google here. Thanks GOOGLE!