Guest blog by Shashank Mishra, Data Engineer @ Expedia
TLDR
This article outlines the integration between Mage and Google BigQuery, a serverless data warehousing service. We’ll discuss the integration process, its benefits, and how it aids businesses in making data-driven decisions.
Outline
- Introduction to Mage
- Overview of Google BigQuery
- Step by step process to integrate Google BigQuery with Mage
- Conclusion
Introduction to Mage
In an age where data is the new oil, efficient and reliable data management tools are essential. Mage is a platform committed to simplifying data integration and analytics. Designed for seamless data transformation and loading, Mage is transforming how businesses approach data management. Here are its key features:
- Automated Data Pipeline: Mage automates data extraction, transformation, and loading (ETL) processes. It can extract data from multiple sources, transform it to a desirable format, and load it into a data warehouse.
- Data Connectors: Mage offers various data connectors to widely-used data sources like Shopify, Facebook Ads, Google Ads, Google Analytics, etc. This makes it easier to import data from these platforms.
- Easy Integration: Mage provides easy integration with popular data warehouses including Google BigQuery, Amazon Redshift, and Snowflake.
- Pre-built SQL Models: Mage comes with pre-built SQL models for popular e-commerce platforms like Shopify and WooCommerce. These models simplify the process of data analysis.
- Incremental Loading: Mage supports incremental loading, which means only new or updated data is loaded into the data warehouse. This saves storage space and improves efficiency.
- Data Transformations: Mage performs automatic data transformations, converting raw data into a more usable format. This process makes the data ready for analysis and reporting.
- Scheduled Refresh: Data refreshes can be scheduled in Mage, ensuring that the data in the warehouse is always up-to-date.
- Data Security: Mage places a high emphasis on data security, ensuring data privacy and compliance with GDPR and other data protection regulations.
Overview of Google BigQuery
Google BigQuery is a highly scalable, serverless data warehouse offered by Google as part of its Google Cloud Platform (GCP). It is designed to streamline and simplify the processing of big data.
- Serverless Architecture: BigQuery operates on a serverless model, which means users don’t need to manage any servers or infrastructure. This means you can focus more on analysis and less on maintenance. It allows you to query massive datasets in seconds and get insights in real-time, without needing to worry about resource provision.
- Real-Time Analytics: BigQuery is engineered for real-time analytics. It allows users to analyze real-time data streams instantly. With its ability to run SQL queries on petabytes of data, it delivers speedy results on real-time data analytics, enabling businesses to make timely decisions.
Google BigQuery, with its serverless architecture and real-time analytics, serves as a robust platform to handle, analyze, and draw insights from massive datasets with ease.
Step by step process to migrate Google BigQuery with Mage
Before we begin, we’ll need to create a service account key. Please read Google Cloud’s documentation on how to create that.
Once we are finished, following these steps:
- Create a new pipeline or open an existing pipeline.
- Expand the left side of the screen to view the file browser.
- Scroll down and click on a file named io_config.yaml
- Enter the following keys and values under the key named default (we can have multiple profiles, add it under whichever is relevant for us)
- Note: we only need to add the keys under GOOGLE_SERVICE_ACC_KEY or the value for key GOOGLE_SERVICE_ACC_KEY_FILEPATH (both are not simultaneously required).
version: 0.1.1
default:
GOOGLE_SERVICE_ACC_KEY:
type: service_account
project_id: project-id
private_key_id: key-id
private_key:
"-----BEGIN PRIVATE KEY-----\nyour_private_key\n-----END_PRIVATE_KEY"
client_email: your_service_account_email
auth_uri: "https://accounts.google.com/o/oauth2/auth"
token_uri: "https://accounts.google.com/o/oauth2/token"
auth_provider_x509_cert_url: "https://www.googleapis.com/oauth2/v1/certs"
client_x509_cert_url:
"https://www.googleapis.com/robot/v1/metadata/x509/your_service_account_email"
GOOGLE_SERVICE_ACC_KEY_FILEPATH: "/path/to/your/service/account/key.json"
Using SQL block
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block.
- Select SQL.
- Under the Data provider dropdown, select BigQuery
- Under the Profile dropdown, select default (or the profile we added credentials underneath).
- Next to the Database label, enter the database name we want this block to save data to.
- Next to the Save to schema label, enter the schema name we want this block to save data to.
- Under the Write policy dropdown, select Replace or Append (please see SQL blocks guide for more information on write policies).
- Enter in this test query: SELECT 1
- Run the block.
Using Python block
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).
- Select Generic (no template).
- Enter this code snippet (note: change the config_profile from default if we have a different profile):
from mage_ai.data_preparation.repo_manager import get_repo_path
from mage_ai.io.bigquery import BigQuery
from mage_ai.io.config import ConfigFileLoader
from os import path
from pandas import DataFrame
if 'data_loader' not in globals():
from mage_ai.data_preparation.decorators import data_loader
@data_loader
def load_data_from_big_query(**kwargs) -> DataFrame:
query = 'SELECT 1'
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
return BigQuery.with_config(ConfigFileLoader(config_path, config_profile)).load(query)
- Run the block.
Conclusion
Integrating Mage with Google BigQuery provides your team with a potent combination of automated data pipeline management and robust data warehousing. This partnership not only simplifies data extraction, transformation, and loading but also provides a seamless pathway for data analysis and insight generation. As we’ve demonstrated in this step-by-step guide, the integration process is straightforward, making it an accessible option for businesses of all sizes. By leveraging this integration, you can unlock the full potential of your data, streamline operations, and drive data-informed decisions.
Link to the original blog: https://www.mage.ai/blog/data-integration-google-bigquery-with-mage
Top comments (0)