In e-commerce scenarios, merchants need to understand their store's operational dynamics and make informed decisions grounded in various data indicators. Among these indicators, order data stands out as a pivotal element. By analyzing order data, merchants can gain insights into their sales performance, customer behavior, and the efficacy of marketing campaigns, among other factors.
For example, the following is some order data provided by an app:
Indicator | Data | Change compared to yesterday |
---|---|---|
Order payment amount | 787,500 | +230.86% |
Number of buyers | 65,080,000 | +129.89% |
Average spend per buyer | 121.00 | +43.92% |
This app also provides a variety of data comparison schemes:
Scheme | Time range |
---|---|
1 | Today's real-time data/Compared to yesterday’s data |
2 | Yesterday's data/Compared to the day before |
3 | Last 7 days data/7 days rolling comparison |
4 | Last 30 days data/30 days rolling comparison |
Taking the information provided by this app as an example, this article will introduce how to implement a dashboard showcasing the order payment amount, number of buyers, and average spend per buyer through RisingWave's materialized views.
Let’s initiate an analysis on the two tables. From the first table, for each indicator, we can see the real-time data for the day, as well as the comparison with yesterday's data. In addition, users can choose from multiple comparison schemes as delineated in the second table, such as: comparison of yesterday's data with the day before, comparison of the last 7 days data with a 7-day rolling comparison, comparison of the last 30 days data with a 30-day rolling comparison, or analysis of a custom time period.
Here we see two scenarios, the real-time scenario and the historical data scenario. The real-time scenario focuses on the display of real-time data for the day, while the historical data scenario covers all data viewing except for the present day.
Order data ingestion
Now let’s begin with ingesting order data. Assuming all order data is already in an upstream PostgreSQL database, we first use RisingWave's CDC feature to import the order table from the upstream database into RisingWave.
The example below only lists key fields of the order table, such as order ID (o_orderkey), buyer ID (o_custkey), order price (o_totalprice), and order time (o_orderdate).
CREATE TABLE orders (
o_orderkey BIGINT,
o_custkey INTEGER,
o_totalprice NUMERIC,
o_orderdate timestamp with time zone,
...
PRIMARY KEY (o_orderkey)
) WITH (
connector = 'postgres-cdc',
hostname = '127.0.0.1',
port = '5432',
username = 'xxx',
password = 'xxx',
database.name = 'store',
schema.name = 'public',
table.name = 'orders'
);
Calculating payment amount
First, let's see how we achieve to calculate the real-time payment amount for the current day without materialized views.
We can do this by filtering the order table's time to include orders from midnight to the current time and summing the order prices. The query for the real-time scenario is as follows. Note that date_trunc('day', now())
means truncating the current time to the day level.
select sum(o_totalprice) from orders where o_orderdate between date_trunc('day', now()) and now();
For the historical data scenario, let's take the past 7 days as an example:
select sum(o_totalprice) from orders where o_orderdate between date_trunc('day', now() - INTERVAL '7 days') and date_trunc('day', now());
From the SQL query above, you can see the real-time query and historical data query only differ in their filter conditions. However, as the volume of order data increases, the time range extends, or the query concurrency rises, the SQL query may experience performance slowdowns.
To address this issue, we often use materialized views to speed up queries, but this solution comes with certain prerequisites: the materialized query itself needs to be predefined and unchanged. However, since the time range input by users in the example is arbitrary, it’s impractical to create a materialized view for every range query.
Therefore, the implementation of materialized views in calculating payment amounts involves two crucial steps:
- First, pre-aggregate the order table with a certain reusable time granularity, we can pre-aggregate the order table's payment amount by minute, hour, or day level, and materialize the results through a materialized view.
- Second, change the query above to query the pre-aggregated materialized view.
In addition, when designing materialized views to accelerate queries, we should focus on the business's query logic and materialize the computationally reusable results as much as possible.
Pre-aggregation of data
Now let's proceed with the first step, pre-aggregating the order table's payment amount by minute, hour, and day level, and materializing the results through a materialized view.
- Minute-level pre-aggregation
Suppose the user's query necessitates a time granularity of minutes, we can create a minute-level pre-aggregated materialized view. The following SQL query aggregates the order table by minute dimensions to calculate the payment amount.
create materialized view orders_total_price_per_min as
select date_trunc('minute', o_orderdate) minute, sum(o_totalprice) totalprice
from orders group by date_trunc('minute', o_orderdate);
- Hour-level pre-aggregation
RisingWave supports the hierarchical construction of materialized views. On top of the minute-level materialized view, we can further aggregate an hour-level one.
create materialized view orders_total_price_per_hour as
select date_trunc('hour', minute) hour, sum(totalprice) totalprice
from orders_total_price_per_min group by date_trunc('hour', minute);
- Day-level pre-aggregation
Similarly, we can construct a day-level pre-aggregated materialized view.
create materialized view orders_total_price_per_day as
select date_trunc('day', hour) date, sum(totalprice) totalprice
from orders_total_price_per_hour group by date_trunc('day', hour);
Querying materialized views
With the above pre-aggregated materialized views, let's see how to calculate the payment amount for 7 days. In contrast to the previous query, we now direct the query to the orders_total_price_per_day
materialized view. This adjustment significantly reduces the amount of data required for the new query.
select sum(totalprice) from orders_total_price_per_day where date between date_trunc('day', now() - INTERVAL '7 days') and date_trunc('day', now());
Moreover, the query for the real-time payment amount on the current day will become very simple, because RisingWave's materialized views are updated in real time. Thus, we just need to directly query the orders_total_price_per_day
for this indicator. Here, there's actually only one row of data.
select sum(totalprice) from orders_total_price_per_day where date between date_trunc('day', now()) and now();
During calculating the payment amount, in summary, we efficiently meet the need for real-time calculation of payment amounts and the analysis of arbitrary time ranges (by minute, hour, or day granularity) through RisingWave's real-time materialized views.
Calculating number of buyers
Through the example of calculating payment amounts, you should be familiar with how to use materialized views to build data dashboards. Let's look at another indicator: the number of buyers. In e-commerce, the number of buyers is generally de-duplicated by day, meaning a user who places many orders in the same day is counted only once. Let's directly write the corresponding materialized view.
create materialized view orders_unique_customer_per_day as
select date_trunc('day', o_orderdate) date, count(distinct o_custkey) custnum
from orders group by date_trunc('day', o_orderdate);
Similarly, by querying orders_unique_customer_per_day
, we can calculate the number of buyers for any range of days. Below is the number of buyers in the last 7 days.
select sum(custnum) from orders_unique_customer_per_day where date between date_trunc('day', now() - INTERVAL '7 days') and date_trunc('day', now())
Calculating average spend per buyer
We know that average spend per buyer = payment amount / number of buyers. Actually, with the above two materialized views, orders_total_price_per_day
and orders_unique_customer_per_day
, we can directly get the average spend per buyer for the required time range.
Below calculates the average spend per buyer for the last 7 days:
select
(select sum(totalprice) from orders_total_price_per_day where date between date_trunc('day', now() - INTERVAL '7 days') and date_trunc('day', now()))
/
(select sum(custnum) from orders_unique_customer_per_day where date between date_trunc('day', now() - INTERVAL '7 days') and date_trunc('day', now()));
Please note, for indicators with an average meaning, such as the average spend per buyer here, if we want to use materialized views to support their queries for different time ranges, we can not just maintain one indicator(average spend per buyer). Instead, we need to maintain the payment amount and the number of buyers too. (Similar aggregate functions include variance, standard deviation: stddev_pop
, stddev_samp
, var_pop
, var_samp
)
Real-time trend chart
We have created a minute and hour-level payment amount materialized view above, which can come in handy for real-time trend charts. We can draw minute-level or hour-level data line charts, then compare the data for any two days. In real-time scenarios, we can also calculate the current day's growth rate compared to yesterday, which is also applicable to e-commerce promotion days when there's a high interest in real-time data.
Thanks to RisingWave's compatibility with the PostgreSQL protocol, we can directly use Grafana to visualize the data in RisingWave. Below, we use Grafana to select RisingWave as a compatible data source under the PostgreSQL protocol, querying the time series data in the hourly granularity materialized view orders_total_price_per_hour
. In the example, we select 2024-01-11
as the comparison day and 2024-01-12
as the current day.
SQL for comparison day:
SELECT hour, totalprice FROM orders_total_price_per_hour where hour between '2024-01-11 00:00:00+00:00' and ' 2024-01-11 23:59:59+00:00' order by hour;
SQL for current day:
Since we want to align the times for comparison in Grafana, we shift the current day's time one day back to the comparison day's time.
SELECT hour - interval '1' day, totalprice as totalprice2 FROM orders_total_price_per_hour where hour between ' 2024-01-12 00:00:00+00:00' and ' 2024-01-12 23:59:59+00:00' order by hour;
Then, we can compare trends in Grafana.
CONCLUSION
After reading this article, you now have the knowledge of using RisingWave to cater to your data dashboard needs. We first synchronize order data from upstream into RisingWave through CDC, then calculate the payment amount and number of buyers indicators for minute, hour, and day-level order tables through hierarchical materialized views. Next, we construct the appropriate time range based on the user’s selected time range and query the materialized views. We recommend that when designing materialized views to accelerate queries, one should combine the business’s query characteristics to materialize the computationally reusable results as much as possible. Finally, we use Grafana to visualize the data within RisingWave to compare trends.
About RisingWave Labs
RisingWave is an open-source distributed SQL database for stream processing. It is designed to reduce the complexity and cost of building real-time applications. RisingWave offers users a PostgreSQL-like experience specifically tailored for distributed stream processing.
Official Website: https://www.risingwave.com/
Documentation: https://docs.risingwave.com/docs/current/intro/
GitHub: https://github.com/risingwavelabs/risingwave
LinkedIn: linkedin.com/company/risingwave-labs
Slack: https://risingwave.com/slack
Top comments (0)