DEV Community

Cover image for Regex Operations in BigQuery
Smit Thakkar
Smit Thakkar

Posted on • Originally published at thefellowcoder.com

Regex Operations in BigQuery

BigQuery is taking the analytics industry by a storm. One of the most important use-cases is to extract and transform parts of a string. Below are some of the powerful constructs provided by BigQuery.

Regex Constructs

BigQuery provides the following constructs for regex operations

Construct Arguments Use
REGEXP_CONTAINS REGEXP_CONTAINS(value, regexp) Checks if the string contains the regex
REGEXP_EXTRACT REGEXP_EXTRACT(value, regexp[, position[, occurrence]]) Extract the substring that matches the regex.
REGEXP_EXTRACT_ALL REGEXP_EXTRACT_ALL(value, regexp) Return all substrings that match the regex
REGEXP_INSTR REGEXP_INSTR(source_value, regexp [, position[, occurrence, [occurrence_position]]]) Return 1-based index for the first occurrence of the regex
REGEXP_REPLACE REGEXP_REPLACE(value, regexp, replacement) Replace all substrings in value

Source : Google Cloud Platform

Table Setup

Let's create a simple table with some string data to play around.

-- Create Store
CREATE OR REPLACE TABLE `gcp-tour-276710.thefellowcoder.store`
(
    date DATE,
    details STRING,
    address STRING
)

-- Add values
INSERT INTO `gcp-tour-276710.thefellowcoder.store`
VALUES
('2021-07-14', '<h1>Nike Shoes</h1><p>Base: $450 Addon: $20</p><p>Support email: help@nike.com</p>','Los Angeles, California 90001 '),
('2021-06-13', '<h1>Rebook Shoes</h1><p>Base: $150 Addon: $10</p><p>Support email: help@rebook.com</p>','Sacramento, California 94203 '),
('2021-07-11', '<h1>Adidas Shoes</h1><p>Base: $350 Addon: $30</p><p>Support email: help@adidas.com</p>','Beverly Hills, California 90209 '),
('2021-07-11', '<h1>Carnival All Rounder Shoes</h1><p>Base: $250 Addon: $0</p>','Beverly Hills, California 90209 ')

Enter fullscreen mode Exit fullscreen mode

Sample Queries

Lets see each of the constructs in action

  • REGEXP_CONTAINS example
-- All stores that have a support email
select * from `gcp-tour-276710.thefellowcoder.store` where REGEXP_CONTAINS(details, r'\S+@\S+\.\S+')
Enter fullscreen mode Exit fullscreen mode
  • REGEXP_EXTRACT example
-- Extract Product Names
select REGEXP_EXTRACT(details, r'<h1>([^<]+)<\/h1>') from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode
  • REGEXP_EXTRACT_ALL example
-- Extract all prices
select REGEXP_EXTRACT_ALL(details, r'\$(.\d+)') from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode
  • REGEXP_INSTR example
-- Get index of email
select details, REGEXP_INSTR(details, r'\S+@\S+\.\S+') from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode
  • REGEXP_REPLACE example
-- Replace all email-ids
select REGEXP_REPLACE(details, r'\S+@\S+\.\S+', 'admin@thefellowcoder.com') from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode

Common Use-cases:

Extract part of the string with regex

This is one of the most common use-cases wherein we must extract substring that matches a regex.

-- Extract product names from all details
select REGEXP_EXTRACT(details, r'<h1>([^<]+)<\/h1>') from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode

Replace regex with string

Let's say you want to sanitize the string and remove all special characters from the string.

-- remove special characters
select details,REGEXP_REPLACE(details, r'[^0-9a-zA-Z]+',' ') from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode

Matches any of the regex patterns

check for multiple regex patterns for a string

-- check for pincodes starting with 90/94
select address,REGEXP_EXTRACT(address, r'9[0|4]\d+') as pincode from `gcp-tour-276710.thefellowcoder.store`
Enter fullscreen mode Exit fullscreen mode

Useful regex examples for data cleaning.

String Regex Output
<h1>Hello World!</h1> <h1>([^<]+)<\/h1> Hello World!

Support email: help@rebook.com

\w+@\w+.\w+ help@rebook.com
Rebook ShOes (?i)shoes ShOes
This,contains$special\characters [^0-9a-zA-Z]+ This contains special characters
String needs to be trimmed. ^(\s+).*?(\s+)$ String needs to be trimmed.

Hope you found this post useful, do check out my blog for more such content.

Top comments (0)