DEV Community

loading...
Cover image for Why you need SQL window functions (part 1)

Why you need SQL window functions (part 1)

Helen Anderson
Data wrangler at Kiwibank, AWS Data Hero, and tag moderator on Dev.to.
Originally published at helenanderson.co.nz Updated on ・3 min read

Welcome to the world of window functions.

A window function is closely related to an aggregate function. However, instead of collapsing all rows into one, we keep them all. A new column is then added with a running total, ranking, or moving average. This is our 'window frame.'

If you're interested in using this powerful tool, keep reading for examples and pictures.


1 – Introduction
2 – Syntax
3 – Setting the scene
4 – Create a running total
5 – RANK rows based on a given criteria


Introduction

Window functions come in three main types. These are:

Aggregate Window Functions

These use aggregate functions like SUM, COUNT, MAX, MIN over a set of rows and return a single result from the query.

Ranking Window Functions

These assign a ‘rank’ to a set of rows and using RANK, DENSE_RANK, ROW_NUMBER, NTILE

Value Window Functions

These use LAG, LEAD, FIRST_VALUE, LAST_VALUE to access a previous row without having to do a self-join.


Syntax

Let's start off by exploring aggregate window functions and how they work. Using a few real-life examples, I will simplify the syntax and explain the use cases.

You will need:

  • The function you want to perform: AVG, SUM, COUNT
  • An indication you want to use this Function over multiple rows: OVER
  • How you want to group your rows – PARTITION
  • How you want to order your rows – ORDER BY
  • To make sense of the syntax and show the use cases I’m going to tackle some real-life problems faced by a Sales Analyst.

Setting the scene

In this case, the sales manager sent us a request. In order to set her team's targets for next year, she needs historic data. The data is in the sales database, but we need it in a format that is easier to use than the raw table.


Alt Text


How to create a running total

She would like to be able to see daily sales totals as well as individual order IDs so she can 'drill down' as needed.

We could use SUM to total all the rows in the Orders table. However, this collapses the order details. A window function will allow the Sales Manager to see each order with a running total.

select
  sale_date,
  salesorderid,
  subtotal,
  sum(subtotal) over(partition by sale_date order by salesorderid) as total_sales
from 
  sales.salesorderheader
where 
  orderdate between 
'2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000'
order by sale_date
Enter fullscreen mode Exit fullscreen mode

By using a window function, we can see each order for each day, and its total, with a running total along the side. Other aggregate functions work the same way, so you can use COUNT, AVG, MIN or MAX, or in combination.


Alt Text


How to RANK rows based on a given criteria

The Sales Manager has returned.

Although she was happy with the table created for setting targets, she needs strategies for increasing sales. She wants to see the sales by customer and dollar value for 2018. Perhaps she could go knock on their door again if she can remember when their big sales were in the past?

We need a window function with a RANK to provide the information she needs.

select
  sale_date,
  salesorderid,
  subtotal,
  rank() over(order by subtotal desc) as sales_rank
from 
  sales.salesorderheader
where 
  orderdate between 
'2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000'
order by sale_date
Enter fullscreen mode Exit fullscreen mode

Using RANK() in a window function ranks each row, in this case, by subtotal. The Sales Manager can now decide if she wants to target those customers again in the coming year.


Alt Text


For rows with identical numbers, as in the example above, the two and three positions are skipped. If you use DENSE_RANK, then you would keep the second and third positions before moving on to the fourth.


Check out part two to learn even more and to see how you can use window functions to your advantage.


Read more


This post originally appeared on helenanderson.co.nz

Discussion (4)

Collapse
nickstavrou profile image
NickStavrou • Edited

You explicitly used

between '2018-01-01 00:00:00:000' and '2018-12-31 00:00:00:000'

in your query but it makes it error prone if your datetime isnt as accurate in milliseconds as you shown. You should better use for example:

(Your predicate) >= 2018-01-01 and < 2018-12-31

and this will guaranty that it will only take all the rows from the start and only the rows till the end.
So always use an open-ended range to prevent erroneously including or excluding rows. It’s much less complex to find the beginning of the next period than the end of the current period.

Collapse
katiekodes profile image
Katie

Oh, what would I do without window functions?

Collapse
moopet profile image
Ben Sinclair

There's always more in SQL. I don't use it much these days, but whenever I look at your posts there's something I didn't know.

Collapse
helenanders26 profile image
Helen Anderson Author

Thanks Ben :D