Introduction
Window functions are a powerful feature in SQL used to perform calculations across a set of rows related to the current row. Unlike aggregate functions, which combine rows into a single output, window functions return a result for each row while allowing calculations to consider other rows in the defined window.
In this article, we’ll explore some commonly used SQL window functions (ROW_NUMBER()
, RANK()
, DENSE_RANK()
, NTILE()
, LEAD()
, and LAG()
) with examples.
Sample Table: Sales Data
We’ll use the following Sales table to demonstrate window functions:
SalesID | CustomerID | Product | Region | Amount | SaleDate |
---|---|---|---|---|---|
1 | 101 | Laptop | North | 1200 | 2023-01-05 |
2 | 102 | Tablet | North | 800 | 2023-02-15 |
3 | 103 | Phone | North | 800 | 2023-03-10 |
4 | 104 | Tablet | North | 500 | 2023-04-01 |
5 | 105 | Laptop | South | 1300 | 2023-05-05 |
6 | 106 | Tablet | South | 700 | 2023-06-20 |
7 | 107 | Phone | West | 900 | 2023-07-15 |
8 | 108 | Laptop | East | 1300 | 2023-08-10 |
1. ROW_NUMBER()
The ROW_NUMBER()
function assigns a unique number to each row based on the order defined in the ORDER BY clause within the OVER statement, optionally partitioned by the PARTITION BY clause.
Task: Assign a unique row number to each sale within a region based on the sale amount (highest to lowest).
SELECT SalesID, Region, Amount,
ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Amount DESC) AS RowNum
FROM Sales;
Result:
SalesID | Region | Amount | RowNum |
---|---|---|---|
1 | North | 1200 | 1 |
2 | North | 800 | 2 |
3 | North | 800 | 3 |
4 | North | 500 | 4 |
5 | South | 1300 | 1 |
6 | South | 700 | 2 |
7 | West | 900 | 1 |
8 | East | 1300 | 1 |
2. RANK()
The RANK()
function assigns a rank to each row based on the ORDER BY clause in the OVER statement. Rows with the same value receive the same rank, with gaps in the ranking for duplicate values.
Task: Rank sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS Rank
FROM Sales;
Result:
SalesID | Region | Amount | Rank |
---|---|---|---|
1 | North | 1200 | 1 |
2 | North | 800 | 2 |
3 | North | 800 | 2 |
4 | North | 500 | 4 |
5 | South | 1300 | 1 |
6 | South | 700 | 2 |
7 | West | 900 | 1 |
8 | East | 1300 | 1 |
Key Feature:
- For North region, both Amount = 800 rows share rank 2.
- The next rank is skipped (i.e., rank 3 is missing) and jumps to 4.
3. DENSE_RANK()
The DENSE_RANK() function assigns ranks like RANK(), but it doesn’t skip ranks after ties.
Task: Assign dense ranks to sales within each region by amount (highest to lowest).
SELECT SalesID, Region, Amount,
DENSE_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS DenseRank
FROM Sales;
Result:
SalesID | Region | Amount | DenseRank |
---|---|---|---|
1 | North | 1200 | 1 |
2 | North | 800 | 2 |
3 | North | 800 | 2 |
4 | North | 500 | 3 |
5 | South | 1300 | 1 |
6 | South | 700 | 2 |
7 | West | 900 | 1 |
8 | East | 1300 | 1 |
Key Feature:
- For North region, both Amount = 800 rows share rank 2.
- The next rank is 3, with no skipping of ranks.
4. NTILE()
NTILE() divides rows into a specified number of approximately equal groups.
Task: Divide all sales into 4 groups based on Amount in descending order.
SELECT SalesID, Amount,
NTILE(4) OVER (ORDER BY Amount DESC) AS Quartile
FROM Sales;
Result:
SalesID | Amount | Quartile |
---|---|---|
5 | 1300 | 1 |
8 | 1300 | 1 |
1 | 1200 | 2 |
7 | 900 | 2 |
2 | 800 | 3 |
3 | 800 | 3 |
4 | 500 | 4 |
6 | 700 | 4 |
5. LEAD()
LEAD()
retrieves the value from the next row as determined by the ORDER BY clause in the OVER statement, within the same partition if specified.
Task: Compare each sale amount to the next sale amount, ordered by SaleDate.
SELECT SalesID, Amount,
LEAD(Amount) OVER (ORDER BY SaleDate) AS NextAmount
FROM Sales;
Result:
SalesID | Amount | NextAmount |
---|---|---|
1 | 1200 | 800 |
2 | 800 | 800 |
3 | 800 | 500 |
4 | 500 | 1300 |
5 | 1300 | 700 |
6 | 700 | 900 |
7 | 900 | 1300 |
8 | 1300 | NULL |
6. LAG()
LAG()
retrieves the value from the previous row as determined by the ORDER BY clause in the OVER statement, within the same partition if specified.
Task: Compare each sale amount to the previous sale amount, ordered by SaleDate.
SELECT SalesID, Amount,
LAG(Amount) OVER (ORDER BY SaleDate) AS PrevAmount
FROM Sales;
Result:
SalesID | Amount | PrevAmount |
---|---|---|
1 | 1200 | NULL |
2 | 800 | 1200 |
3 | 800 | 800 |
4 | 500 | 800 |
5 | 1300 | 500 |
6 | 700 | 1300 |
7 | 900 | 700 |
8 | 1300 | 900 |
Conclusion
SQL window functions like ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG() provide powerful ways to analyze data by performing calculations across rows within a defined window.
Key Takeaways:
-
ROW_NUMBER()
assigns a unique identifier for each row. -
RANK()
andDENSE_RANK()
differ in how they handle ties (skipping vs. no skipping). -
NTILE()
is useful for dividing rows into statistic groups. -
LEAD()
andLAG()
allow comparisons with adjacent rows.
By mastering these functions, you can handle complex analytics and ranking tasks effectively!
Explore more
Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.
Top comments (5)
I suggest you add a short definition for the terms "window" and "partition". Are they the same thing? It looks like it, but the reader can't tell.
Also, the phrase "while maintaining the context of the dataset" doesn't add any meaning. What is the intent behind your writing it? What are you trying to make clear?
The information itself is very interesting, these are just suggestions that might improve the writing a little.
Thank you so much for your thoughtful feedback! Your suggestions were spot on, and I’ve made the necessary adjustments to the text.
I really appreciate you taking the time to share your insights—it definitely helped improve the clarity of the content!
After 25+ years, I don't often learn something new about SQL. Excellent work. Thanks for sharing this.
Great writing...👏🏼👏🏼
Looks awesome, great work