Consider the following tables in our database:
employee_id is the primary key of this table.
employee_id is a foreign key to Employee table. Each row of this table indicates that the employee with
employee_id is working on the project with
employee_id is the primary key of this table, and
experience_years is not NULL. Each row of this table contains information about one employee.
The challenge is to write an SQL query that reports the average experience years of all the employees for each project, rounded to 2 digits. The result should be returned in any order.
For example, consider the following data:
The average experience years for the first project is (3 + 2 + 1) / 3 = 2.00 and for the second project is (3 + 2) / 2 = 2.50.
There are a few ways to tackle this problem, and we will discuss two major ones in this post.
The first approach leverages SQL's window functions, particularly
This query uses a
JOIN operation to combine the
Employee tables, then computes the average of
experience_years for each
project_id using the
AVG() function in combination with the
PARTITION BY clause.
SELECT DISTINCT p.project_id, ROUND(AVG(e.experience_years * 1.0) OVER (PARTITION BY p.project_id), 2) [average_years] FROM Project p JOIN Employee e ON p.employee_id = e.employee_id
This code runs in 4258ms and beats 26.23% of other submissions.
The second approach uses a
GROUP BY clause to achieve similar results. After joining the
Employee tables, it groups the result by
project_id and then computes the average of
experience_years for each group.
SELECT p.project_id, ROUND(AVG(e.experience_years * 1.0), 2) [average_years] FROM Project p JOIN Employee e ON p.employee_id = e.employee_id GROUP BY p.project_id
This code runs in 4552ms and beats 15.44% of other submissions.
Both of these methods achieve the desired result, but with different performance results. While the window function solution tends to be slower due to the overhead of partitioning, the GROUP BY solution can be faster but may also consume more memory due to grouping.
In this case, Source Code 1, which uses window functions, performs better than Source Code 2, which uses a
GROUP BY clause. However, performance can vary depending on the specific RDBMS and data distribution. It's crucial to understand your data and SQL engine to select the best approach for your needs.
The original problem can be found at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.