The Problem
Consider the following tables in our database:
Table: Project
Column Name | Type |
---|---|
project_id | int |
employee_id | int |
project_id
, employee_id
is the primary key of this table. employee_id
is a foreign key to Employee table. Each row of this table indicates that the employee with employee_id
is working on the project with project_id
.
Table: Employee
Column Name | Type |
---|---|
employee_id | int |
name | varchar |
experience_years | int |
employee_id
is the primary key of this table, and experience_years
is not NULL. Each row of this table contains information about one employee.
The challenge is to write an SQL query that reports the average experience years of all the employees for each project, rounded to 2 digits. The result should be returned in any order.
Explanation
For example, consider the following data:
Project table:
project_id | employee_id |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 4 |
Employee table:
employee_id | name | experience_years |
---|---|---|
1 | Khaled | 3 |
2 | Ali | 2 |
3 | John | 1 |
4 | Doe | 2 |
The average experience years for the first project is (3 + 2 + 1) / 3 = 2.00 and for the second project is (3 + 2) / 2 = 2.50.
The Solution
There are a few ways to tackle this problem, and we will discuss two major ones in this post.
Source Code 1: Using Window Functions
The first approach leverages SQL's window functions, particularly AVG()
with PARTITION BY
.
This query uses a JOIN
operation to combine the Project
and Employee
tables, then computes the average of experience_years
for each project_id
using the AVG()
function in combination with the PARTITION BY
clause.
SELECT DISTINCT
p.project_id,
ROUND(AVG(e.experience_years * 1.0) OVER (PARTITION BY p.project_id), 2) [average_years]
FROM
Project p JOIN Employee e ON p.employee_id = e.employee_id
This code runs in 4258ms and beats 26.23% of other submissions.
Source Code 2: Using GROUP BY
The second approach uses a GROUP BY
clause to achieve similar results. After joining the Project
and Employee
tables, it groups the result by project_id
and then computes the average of experience_years
for each group.
SELECT
p.project_id,
ROUND(AVG(e.experience_years * 1.0), 2) [average_years]
FROM
Project p JOIN Employee e ON p.employee_id = e.employee_id
GROUP BY p.project_id
This code runs in 4552ms and beats 15.44% of other submissions.
Conclusion
Both of these methods achieve the desired result, but with different performance results. While the window function solution tends to be slower due to the overhead of partitioning, the GROUP BY solution can be faster but may also consume more memory due to grouping.
In this case, Source Code 1, which uses window functions, performs better than Source Code 2, which uses a GROUP BY
clause. However, performance can vary depending on the specific RDBMS and data distribution. It's crucial to understand your data and SQL engine to select the best approach for your needs.
The original problem can be found at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)