DEV Community

Cover image for When to use window functions over group by in SQL?
Mahmoud Ahmed
Mahmoud Ahmed

Posted on • Updated on

When to use window functions over group by in SQL?

Window functions and GROUP BY have some similarities, but they are used for different purposes.

GROUP BY is used to aggregate data based on specific groupings of data. The result of a GROUP BY operation will be a summary of the data, showing aggregate values for each group. Like doing GROUP BY a category column in a sales dataset to get the total sales for each category.

Window functions, on the other hand, are used to calculate a value for each row based on a calculation that takes into account other rows in the dataset. The result of a window function will be a new column with a value for each row. Like using a window function to calculate the average sales for each product, along with the total sales for each product.

here's an example in SQL that demonstrates the difference between using a window function and a GROUP BY clause.

Consider a table sales with columns product, region, month, and revenue, where each row represents the revenue generated by a product in a specific region during a specific month.

sales table:
+---------+--------+-------+---------+
| product | region | month | revenue |
+---------+--------+-------+---------+
| A       | North  | Jan   | 100     |
| A       | North  | Feb   | 150     |
| A       | South  | Jan   | 200     |
| A       | South  | Feb   | 250     |
| B       | North  | Jan   | 300     |
| B       | North  | Feb   | 350     |
| B       | South  | Jan   | 400     |
| B       | South  | Feb   | 450     |
+---------+--------+-------+---------+
Enter fullscreen mode Exit fullscreen mode

Suppose we want to find the total revenue generated by each product across all regions and months, as well as the percentage of the total revenue that each region and month contributes. We can achieve this using both window functions and a GROUP BY clause, to get this output:

+---------+--------+-------+---------+---------------+--------------+
| product | region | month | revenue | total_revenue | revenue_pct  |
+---------+--------+-------+---------+---------------+--------------+
| A       | North  | Jan   | 100     | 700           | 0.142857143  |
| A       | North  | Feb   | 150     | 700           | 0.214285714  |
| A       | South  | Jan   | 200     | 700           | 0.285714286  |
| A       | South  | Feb   | 250     | 700           | 0.357142857  |
| B       | North  | Jan   | 300     | 1500          | 0.2          |
| B       | North  | Feb   | 350     | 1500          | 0.233333333  |
| B       | South  | Jan   | 400     | 1500          | 0.266666667  |
| B       | South  | Feb   | 450     | 1500          | 0.3          |
+---------+--------+-------+---------+---------------+--------------+
Enter fullscreen mode Exit fullscreen mode

Here's how we would do it using a window function:

SELECT
  product,
  region,
  month,
  revenue,
  SUM(revenue) OVER (PARTITION BY product) AS total_revenue,
  revenue / SUM(revenue) OVER (PARTITION BY product) AS revenue_pct
FROM sales
Enter fullscreen mode Exit fullscreen mode

This query uses the SUM window function to calculate the total revenue for each product, and then divides the revenue for each row by that total to get the percentage of the total revenue contributed by that row.

Here's how we would do it using a GROUP BY clause:

SELECT
  product,
  region,
  month,
  revenue,
  total_revenue,
  revenue / total_revenue AS revenue_pct
FROM (
  SELECT
    product,
    region,
    month,
    revenue,
    SUM(revenue) OVER (PARTITION BY product) AS total_revenue
  FROM sales
) subquery
Enter fullscreen mode Exit fullscreen mode

This query first calculates the total revenue for each product using a GROUP BY clause, and then uses a subquery to join that information back to the original table and calculate the percentage of the total revenue contributed by each row.

Summary

In general, you should use window functions when you need to perform calculations over a set of rows that do not necessarily correspond to distinct groups, such as cumulative sums, rolling averages, or rank functions. You should use a GROUP BY clause when you need to group rows based on a common attribute and perform aggregate calculations over each group, such as summing, averaging, or counting.

Top comments (0)