DEV Community

hexfloor
hexfloor

Posted on

SQL Pro tips : AWS Athena SQL UNPIVOT : CROSS JOIN UNNEST

Introduction

TL;DR : Why to use UNPIVOT over UNION ALL ? Performance, UNION ALL scans all the data for each subquery, with UNPIVOT the scan happens only once.
We shall perform now the same operation with AWS Athena
Feel free to check the Oracle XE CROSS JOIN UNPIVOT tips or a AWS documentation.
I'm writing this guide to provide a simple step-by-step manual easy to understand.

UNPIVOT

If you are new to Athena you may wish to go through the quick start guide

Let's start straight away with something trivial :

select 'SUNDAY' as day0, 'MONDAY' as day1;

Enter fullscreen mode Exit fullscreen mode

Image description

and use array

select array['SUNDAY', 'MONDAY'] as day;

Enter fullscreen mode Exit fullscreen mode

Image description

now, let's flatten the array with UNNEST :

select * from unnest(array['SUNDAY', 'MONDAY']) t(day);

Enter fullscreen mode Exit fullscreen mode

Image description

and add another column into the output :

with 
    in_data as (select * from unnest(array['SUNDAY', 'MONDAY'], array[0, 1]) t(day, day_order))
select * from in_data;


Enter fullscreen mode Exit fullscreen mode

Image description

That was fast and straight to the point! Let's explore CROSS JOIN

CROSS JOIN

In a nutshell CROSS JOIN
operation is simple = it's just a Cartesian product of the two tables, take all the possible combinations from the values on the left with the values on the right.
Few examples :

with 
    week_data as (select '2024W01' as week_ref),
    day_data as (select 'SUNDAY' as day_ref)
select * from week_data cross join day_data;

Enter fullscreen mode Exit fullscreen mode

Image description

with UNION ALL :

with 
    week_data as (select '2024W01' as week_ref),
    day_data as (select 'SUNDAY' as day_ref 
    union all select 'MONDAY' as day_ref)
select * from week_data cross join day_data;

Enter fullscreen mode Exit fullscreen mode

Image description

and with array :

with 
    week_data as (select '2024W01' as week_ref),
    day_data as (select array['SUNDAY','MONDAY'] as day_ref)
select * from week_data cross join day_data;
Enter fullscreen mode Exit fullscreen mode

Image description

let's flatten the array :

with 
    week_data as (select '2024W01' as week_ref),
    day_data as (select * from unnest(array['SUNDAY','MONDAY']) r(day_ref))
select * from week_data cross join day_data;

Enter fullscreen mode Exit fullscreen mode

Image description

and add another one to show the full power of the CROSS JOIN :

with 
    week_data as (select * from unnest(array['2024W01', '2024W02']) l(week_ref)),
    day_data as (select * from unnest(array['SUNDAY','MONDAY']) r(day_ref))
select * from week_data cross join day_data;

Enter fullscreen mode Exit fullscreen mode

Image description

Finally we are ready to combine both CROSS JOIN and UNNEST

CROSS JOIN UNNEST

The operation I'm going to perform is named UNPIVOT in Oracle XE, see here at the same time BigQuery has both UNPIVOT and UNNEST, you may check here

Let's have some test data :

with 
    in_data as (select '2024W01' as week_ref, 'SUNDAY' as day0, 0 as day_order0, 'MONDAY' as day1, 1 as day_order1)
select * from in_data;
Enter fullscreen mode Exit fullscreen mode

Image description

Firstly, let's generate the result with UNION ALL :

with 
    in_data as (select '2024W01' as week_ref, 'SUNDAY' as day0, 0 as day_order0, 'MONDAY' as day1, 1 as day_order1)
select week_ref, day0 as day, day_order0 as day_order from in_data
union all 
select week_ref, day1 as day, day_order1 as day_order from in_data;
Enter fullscreen mode Exit fullscreen mode

The query above will scan the in_data as many times as many subqueries you have, in this example we have 2 subqueries.

Image description

and finally the performance optimization with CROSS JOIN UNNEST :

with 
    in_data as (select '2024W01' as week_ref, 'SUNDAY' as day0, 0 as day_order0, 'MONDAY' as day1, 1 as day_order1)
select week_ref, day, day_order
from in_data
cross join unnest(array[day0, day1], array[day_order0, day_order1]) t(day, day_order);
Enter fullscreen mode Exit fullscreen mode

Image description

Using UNPIVOT increases performance of your data analytics requests and saves the energy.
Enjoy !

Top comments (0)