# Regex in Bigquery

Regex (Regular Expression) functions allow you to perform complex string manipulations on your data, making it easier to extract relevant information and prepare it for analysis. It is a lot more powerful than a WHERE LIKE '%xyz' type statement

You can use an online tool like regexr.com to help create the regex pattern itself. I've also used Github Copilot in VS-Code, where I've given it a prompt in a comment and asked it to write the regex expression. Once you hvae that, how do you use it in your Bigquery SQL?

# REGEXP_EXTRACT

# StandardSQL
SELECT 
  name, 
  REGEXP_EXTRACT(name, r'(\d+)') AS extracted_numbers
FROM 
  `mydataset.mytable`

In this example, the REGEXP_EXTRACT function is used to extract a portion of the name column that matches the specified regular expression r'(\d+)'. The regular expression (\d+) matches one or more consecutive digits. The extracted numbers are returned in the extracted_numbers column.

# REGEXP_REPLACE

Replaces parts of a string that match a specified pattern with a specified string.

Example:

# StandardSQL
SELECT 
  name, 
  REGEXP_REPLACE(name, r'(\d+)', '##') AS replaced_numbers
FROM 
  `mydataset.mytable`

# REGEXP_MATCH

Returns true or false depending on whether a specified pattern matches a part of a string.

Example:

# StandardSQL
SELECT 
  name, 
  REGEXP_MATCH(name, r'\d+') AS has_numbers
FROM 
  `mydataset.mytable`

# REGEXP_CONTAINS

Returns true or false depending on whether a specified pattern is found anywhere in a string.

Example:

# StandardSQL
SELECT 
  name, 
  REGEXP_CONTAINS(name, r'\d+') AS has_numbers
FROM 
  `mydataset.mytable`

# REGEXP_EXTRACT_ALL

Extracts all occurrences of a pattern in a string and returns the extracted substrings in an array. Example:

# StandardSQL
SELECT 
  name, 
  REGEXP_EXTRACT_ALL(name, r'(\d+)') AS extracted_numbers
FROM 
  `mydataset.mytable`