Luca Liu

Posted on Dec 29, 2024 • Edited on Jan 4

A Quick Review of SQL Window Functions with Examples

#sql #database #tutorial #beginners

Introduction

Window functions are a powerful feature in SQL used to perform calculations across a set of rows related to the current row. Unlike aggregate functions, which combine rows into a single output, window functions return a result for each row while allowing calculations to consider other rows in the defined window.

In this article, we’ll explore some commonly used SQL window functions (ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG()) with examples.

Sample Table: Sales Data

We’ll use the following Sales table to demonstrate window functions:

SalesID	CustomerID	Product	Region	Amount	SaleDate
1	101	Laptop	North	1200	2023-01-05
2	102	Tablet	North	800	2023-02-15
3	103	Phone	North	800	2023-03-10
4	104	Tablet	North	500	2023-04-01
5	105	Laptop	South	1300	2023-05-05
6	106	Tablet	South	700	2023-06-20
7	107	Phone	West	900	2023-07-15
8	108	Laptop	East	1300	2023-08-10

1. ROW_NUMBER()

The ROW_NUMBER() function assigns a unique number to each row based on the order defined in the ORDER BY clause within the OVER statement, optionally partitioned by the PARTITION BY clause.

Task: Assign a unique row number to each sale within a region based on the sale amount (highest to lowest).

SELECT SalesID, Region, Amount,
       ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Amount DESC) AS RowNum
FROM Sales;

Result:

SalesID	Region	Amount	RowNum
1	North	1200	1
2	North	800	2
3	North	800	3
4	North	500	4
5	South	1300	1
6	South	700	2
7	West	900	1
8	East	1300	1

2. RANK()

The RANK() function assigns a rank to each row based on the ORDER BY clause in the OVER statement. Rows with the same value receive the same rank, with gaps in the ranking for duplicate values.

Task: Rank sales within each region by amount (highest to lowest).

SELECT SalesID, Region, Amount,
       RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS Rank
FROM Sales;

Result:

SalesID	Region	Amount	Rank
1	North	1200	1
2	North	800	2
3	North	800	2
4	North	500	4
5	South	1300	1
6	South	700	2
7	West	900	1
8	East	1300	1

Key Feature:

For North region, both Amount = 800 rows share rank 2.
The next rank is skipped (i.e., rank 3 is missing) and jumps to 4.

3. DENSE_RANK()

The DENSE_RANK() function assigns ranks like RANK(), but it doesn’t skip ranks after ties.

Task: Assign dense ranks to sales within each region by amount (highest to lowest).

SELECT SalesID, Region, Amount,
       DENSE_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS DenseRank
FROM Sales;

Result:

SalesID	Region	Amount	DenseRank
1	North	1200	1
2	North	800	2
3	North	800	2
4	North	500	3
5	South	1300	1
6	South	700	2
7	West	900	1
8	East	1300	1

Key Feature:

For North region, both Amount = 800 rows share rank 2.
The next rank is 3, with no skipping of ranks.

4. NTILE()

NTILE() divides rows into a specified number of approximately equal groups.

Task: Divide all sales into 4 groups based on Amount in descending order.

SELECT SalesID, Amount,
       NTILE(4) OVER (ORDER BY Amount DESC) AS Quartile
FROM Sales;

Result:

SalesID	Amount	Quartile
5	1300	1
8	1300	1
1	1200	2
7	900	2
2	800	3
3	800	3
4	500	4
6	700	4

5. LEAD()

LEAD() retrieves the value from the next row as determined by the ORDER BY clause in the OVER statement, within the same partition if specified.

Task: Compare each sale amount to the next sale amount, ordered by SaleDate.

SELECT SalesID, Amount, 
       LEAD(Amount) OVER (ORDER BY SaleDate) AS NextAmount
FROM Sales;

Result:

SalesID	Amount	NextAmount
1	1200	800
2	800	800
3	800	500
4	500	1300
5	1300	700
6	700	900
7	900	1300
8	1300	NULL

6. LAG()

LAG() retrieves the value from the previous row as determined by the ORDER BY clause in the OVER statement, within the same partition if specified.

Task: Compare each sale amount to the previous sale amount, ordered by SaleDate.

SELECT SalesID, Amount, 
       LAG(Amount) OVER (ORDER BY SaleDate) AS PrevAmount
FROM Sales;

Result:

SalesID	Amount	PrevAmount
1	1200	NULL
2	800	1200
3	800	800
4	500	800
5	1300	500
6	700	1300
7	900	700
8	1300	900

Conclusion

SQL window functions like ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG() provide powerful ways to analyze data by performing calculations across rows within a defined window.

Key Takeaways:

ROW_NUMBER() assigns a unique identifier for each row.
RANK() and DENSE_RANK() differ in how they handle ties (skipping vs. no skipping).
NTILE() is useful for dividing rows into statistic groups.
LEAD() and LAG() allow comparisons with adjacent rows.

By mastering these functions, you can handle complex analytics and ranking tasks effectively!

Explore more

Luca Liu

Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau, SAP Business Objects.

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.

🚀 Connect with me on LinkedIn

Top comments (6)

Charles Roth • Dec 31 '24

I suggest you add a short definition for the terms "window" and "partition". Are they the same thing? It looks like it, but the reader can't tell.

Also, the phrase "while maintaining the context of the dataset" doesn't add any meaning. What is the intent behind your writing it? What are you trying to make clear?

The information itself is very interesting, these are just suggestions that might improve the writing a little.

Luca Liu • Dec 31 '24

Thank you so much for your thoughtful feedback! Your suggestions were spot on, and I’ve made the necessary adjustments to the text.

I really appreciate you taking the time to share your insights—it definitely helped improve the clarity of the content!

Doug Wilson • Jan 2

After 25+ years, I don't often learn something new about SQL. Excellent work. Thanks for sharing this.

Shafayet Hossain Yashfi • Dec 30 '24

Great writing...👏🏼👏🏼

Chirag Aggarwal • Dec 30 '24

Looks awesome, great work

Brandon • Jan 31

A quick review of SQL window functions—because who doesn’t love a refresher on something that’s been around forever? Nice effort, though. Hopefully, this helps those still figuring out the magic behind OVER() and PARTITION BY.

Introduction

Sample Table: Sales Data

1. ROW_NUMBER()

2. RANK()

3. DENSE_RANK()

4. NTILE()

5. LEAD()

6. LAG()

Conclusion

Explore more

Luca LiuFollow

Luca Liu