Materialized Views helps improve performance of analytical workloads such as dashboarding, queries from BI (Business Intelligence) tools, and ELT (Extract, Load, Transform) data processing. Materialized Views store the pre-computed results of queries and maintain them by incrementally processing latest changes from base tables. Future queries referencing these Materialized Views can then use the pre-computed results to run🏃🏻♀️much faster.
You can create Materialized Views based on one or more source tables by using filters, projections, inner joins, aggregations, grouping, functions, etc. (👉🏽Read more in the docs, here.)
In a data warehouse (system used for reporting and data analysis) environment, applications often perform complex queries on large tables. A common example would be using a SELECT statement to perform multiple-table joins and aggregations (process where data is collected and presented in summarized format) on tables that contain billions of rows. Due to the complexity and large volume of data, processing these queries can be very time-consuming!
Enter Materialized Views in Amazon Redshift.🙌🏽
A Materialized View stores the result of the SELECT statement that defines the Materialized View. You can then issue a SELECT statement to query the Materialized View, in the same way that you query other tables or views in the database. When you query the Materialized View, you’re now querying that pre-computed result, that was based on an SQL query over one or more base tables. The difference is that now Amazon Redshift can process the query based on the pre-computed data stored in the Materialized View, without having to process the base tables at all!😅 This is a win🏆, because now query results are returned much faster compared to when retrieving the same data from the base tables.
First, let me point you to the docs that detail SQL commands used to create and manage Materialized Views...
Second, let’s walk through a basic example on how to CREATE Materialized Views and REFRESH it after data ingestion. For the purpose of this blog post, I'm going to pretend you’ve already created and connected to your cluster. Let’s run some queries on the AWS Management Console with the Redshift query editor...
I CREATE a simple table with the following SQL command, making sure I hit the run button for each individual query. (To see my results afterwards, I select the table base_table.)
CREATE table base_table (a int, b int);
Now I want to INSERT more values into our base_table.
INSERT into base_table values (1,2);
Time⏳ to CREATE our first Materialized View!😅
CREATE MATERIALIZED VIEW mv_test as (select a from base_table);
Ok, let’s INSERT a few more values into our base_table...
INSERT into base_table values (2,3);
Let's SELECT all our content from base_table.
SELECT * FROM base_table;
Now let's SELECT all our content from mv_test. (This command will retrieve our pre-computed results, which at this point are stale. The base_table has changed, but the changes have not yet been reflected.)
SELECT * FROM mv_test;
Let’s get this up to date. (A Materialized View stores the query result. When you query the Materialized View, you’re querying that pre-computed result.)
refresh materialized view mv_test;
Now let’s SELECT all our content from mv_test again...
SELECT * FROM mv_test;
And now we see it’s up to date! And this is what we call incremental maintenance.💁🏻♀️
Want to replicate a more detailed scenario to better appreciate how this new feature comes in handy?
Check out this example from our documentation. It walks you through the process of creating tables, uploading sample data, and querying the database in your Amazon Redshift cluster.
This new feature is available at no additional cost, in all regions where Amazon Redshift is available.☁️
¡Gracias por tu tiempo!
~Alejandra💁🏻♀️ y Canela🐾