Welcome to the world of window functions.
A window function is closely related to an aggregate function. However, instead of collapsing all rows into one, we keep them all. A new column is then added with a running total, ranking, or moving average. This is our 'window frame.'
If you're interested in using this powerful tool, keep reading for examples and pictures.
Window functions come in three main types. These are:
These use aggregate functions like SUM, COUNT, MAX, MIN over a set of rows and return a single result from the query.
These assign a ‘rank’ to a set of rows and using RANK, DENSE_RANK, ROW_NUMBER, NTILE
These use LAG, LEAD, FIRST_VALUE, LAST_VALUE to access a previous row without having to do a self-join.
Let's start off by exploring aggregate window functions and how they work. Using a few real-life examples, I will simplify the syntax and explain the use cases.
You will need:
- The function you want to perform: AVG, SUM, COUNT
- An indication you want to use this Function over multiple rows: OVER
- How you want to group your rows – PARTITION
- How you want to order your rows – ORDER BY
- To make sense of the syntax and show the use cases I’m going to tackle some real-life problems faced by a Sales Analyst.
In this case, the sales manager sent us a request. In order to set her team's targets for next year, she needs historic data. The data is in the sales database, but we need it in a format that is easier to use than the raw table.
She would like to be able to see daily sales totals as well as individual order IDs so she can 'drill down' as needed.
We could use SUM to total all the rows in the Orders table. However, this collapses the order details. A window function will allow the Sales Manager to see each order with a running total.
select sale_date, salesorderid, subtotal, sum(subtotal) over(partition by sale_date order by salesorderid) as total_sales from sales.salesorderheader where orderdate between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000' order by sale_date
By using a window function, we can see each order for each day, and its total, with a running total along the side. Other aggregate functions work the same way, so you can use COUNT, AVG, MIN or MAX, or in combination.
The Sales Manager has returned.
Although she was happy with the table created for setting targets, she needs strategies for increasing sales. She wants to see the sales by customer and dollar value for 2018. Perhaps she could go knock on their door again if she can remember when their big sales were in the past?
We need a window function with a RANK to provide the information she needs.
select sale_date, salesorderid, subtotal, rank() over(order by subtotal desc) as sales_rank from sales.salesorderheader where orderdate between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000' order by sale_date
Using RANK() in a window function ranks each row, in this case, by subtotal. The Sales Manager can now decide if she wants to target those customers again in the coming year.
For rows with identical numbers, as in the example above, the two and three positions are skipped. If you use DENSE_RANK, then you would keep the second and third positions before moving on to the fourth.
Check out part two to learn even more and to see how you can use window functions to your advantage.
This post originally appeared on helenanderson.co.nz