# Extracting a domain or subdomain from a url in Bigquery
Often when working with web or traffic data you have a list of urls across many different websites. You may then want to map these into domains or subdomains, such as:
https://news.mydomain.co.uk/inflation-is-so-high.html -> mydomain.co.uk
https://mobile.mydomain.co.uk/inflation-is-so-high.html -> mydomain.co.uk
https://news.sky.com/inflation-9-percent.html -> sky.com
If you are anything like me, your first thought would be to search stackoverflow for some pre-made regex and hope it works... but no, fortunately bigquery has some built in NET functions that work really well!
# Extract subdomain from a url in Bigquery
You want to go from:
https://news.mydomain.co.uk/inflation-is-so-high.html -> news.mydomain.co.uk
To do this use NET.HOST(), for instance:
SELECT NET.HOST(https://news.mydomain.co.uk/inflation-is-so-high.html)
gives news.mydomain.co.uk
# Extract domain from a url in Bigquery
You want to go from:
https://news.mydomain.co.uk/inflation-is-so-high.html -> mydomain.co.uk
To do this use NET.REG_DOMAIN(), for instance:
SELECT NET.REG_DOMAIN(https://news.mydomain.co.uk/inflation-is-so-high.html)
gives mydomain.co.uk
# Further info
I use these NET functions all the time... it can be such a headache working with edge-cases on a regex you copy/pasted from a comment. You can read the full documentation from Google here. Thanks GOOGLE!