The Problem
Consider an Employees
table with the following schema:
Column Name | Type |
---|---|
employee_id | int |
name | varchar |
reports_to | int |
age | int |
employee_id
is the primary key for this table. This table contains information about the employees and the id of the manager they report to. Some employees do not report to anyone (reports_to
is null). For this problem, a manager is defined as an employee who has at least one other employee reporting to them. The task is to write an SQL query to report the ids and the names of all managers, the number of employees who report directly to them, and the average age of the reports rounded to the nearest integer.
Explanation
Consider the following Employees
table:
employee_id | name | reports_to | age |
---|---|---|---|
9 | Hercy | null | 43 |
6 | Alice | 9 | 41 |
4 | Bob | 9 | 36 |
2 | Winston | null | 37 |
The expected output would be:
employee_id | name | reports_count | average_age |
---|---|---|---|
9 | Hercy | 2 | 39 |
Here, Hercy has two people report directly to him, Alice and Bob. Their average age is (41+36)/2 = 38.5
, which is 39
after rounding it to the nearest integer.
The Solution
We will discuss four different SQL approaches to solve this problem, explaining their main differences, strengths, weaknesses, and underlying structures.
Source Code 1: Join and Group By
This code uses a JOIN operation to combine rows from two or more tables based on a related column between them.
SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(AVG(e.age * 1.0), 0) [average_age]
FROM Employees e JOIN Employees m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
The GROUP BY clause groups the employee_id
and name
from the m
alias (managers), calculates the count of reports and the average age. The ROUND
function is used to round the average age to the nearest integer. The result is then ordered by employee_id
.
This query runtime is 1559ms, beating 33.60% of other submissions on LeetCode.
Source Code 2: Subquery, Join, and Group By
This code utilizes subqueries to first isolate the needed columns before performing the join operation, potentially improving performance.
SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(AVG(e.age * 1.0), 0) [average_age]
FROM (
SELECT
reports_to,
age
FROM Employees
) e JOIN (
SELECT
employee_id,
name
FROM Employees
) m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
By separating the reports_to
and age
in subquery e
, and employee_id
and name
in subquery m
, the join operation may be faster. This query has a runtime of 1446ms, beating 46.6% of other submissions on LeetCode.
Source Code 3: Subquery, Join, Group By, and Calculated Average
This code also uses subqueries for the join operation, but it calculates the average differently by explicitly calculating the sum and dividing by the count.
SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(SUM(e.age) * 1.0 / COUNT(e.reports_to), 0) [average_age]
FROM (
SELECT
reports_to,
age
FROM Employees
) e JOIN (
SELECT
employee_id,
name
FROM Employees
) m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
By manually calculating the average (sum of ages divided by count), the rounding to the nearest integer can be more precise. This query runtime is 1122ms, beating 93.38% of other submissions on LeetCode.
Source Code 4: Join, Group By, and Calculated Average
This code is similar to Source Code 3, but it does not use subqueries, reverting to a more direct join operation.
SELECT
m.employee_id,
m.name,
COUNT(e.reports_to) [reports_count],
ROUND(SUM(e.age) * 1.0 / COUNT(e.reports_to), 0) [average_age]
FROM Employees e JOIN Employees m ON e.reports_to = m.employee_id
GROUP BY
m.employee_id,
m.name
ORDER BY m.employee_id
This solution combines the direct join approach from Source Code 1 and the explicit average calculation from Source Code 3. This query runtime is 1207ms, beating 79.18% of other submissions on LeetCode.
Conclusion
From these solutions, we learn that the choice of SQL constructs and the way we express computations can impact performance significantly. Interestingly, using subqueries did not always lead to better performance, which emphasizes that SQL optimization can be a complex topic. The top-performing solution was Source Code 3, followed by Source Code 4, Source Code 2, and then Source Code 1. However, it's important to note that LeetCode's performance does not perfectly represent real-world database performance, as many other factors come into play, including indexing, database design, and data distribution.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (1)
Nowadays, there are many similar applications available, and you should choose one that has the advantages that you value. Here you can learn more about the list of the best monitoring programs kickidler.com/info/top-best-employ... with a description of pros and cons. For me personally, Kickidler is the leader, but who is number one for you?