Google introduced support for dynamic SQL in BigQuery. Developers working particularly in Oracle must have some liking for EXECUTE IMMEDIATE, the way you execute dynamic SQL queries. Such a feature in BigQuery was missing for a long time, and now that it is here, I can't wait to use it.
Choosing a problem statement
Let's choose a problem that easily resonates with every developer working with the Google BigQuery world. Who isn't noticing the large volume of temporary tables churned by the client drivers and large datasets. This is particularly true where downstream products implement a version of BigQuery driver and fail to leverage nice features like auto expiration of tables. Not so good part is the hygiene of the dataset, these tables stay forever until explicitly cleared. What is important for this blog is a problem statement to demonstrate the utility of dynamic SQL.
Lets address it using Dynamic SQL
Temporary tables do offer the convenience of caching large result sets. With data rapidly changing on BigQuery dataset, let us target the old temporary tables and remove those from the datasets.
- Our primary goal is to clear all temporary tables older than 24 hours.
- Achieving this goal needs some more information. We need to identify when a table was created. This is when INFORMATION_SCHEMA of BigQuery is helpful.
- Last step is that I want this to be scheduled every day, without my intervention. Yes, you can schedule SQL statements using the BigQuery scheduled query feature.
To clear temporary tables across all datasets, let's write code employing dynamic SQL, iterate all the dataset using the INFORMATION_SCHEMA and delete the temp table using the timestamp and the name starting with temp_table_. And schedule the SQL code using the BigQuery scheduled query option. With this, all the temp tables that are older than 1 day should get automatically cleared at a daily cadence.
Top comments (0)