Retiago Drago

Posted on Jul 10, 2023

The Number of Employees Which Report to Each Employee | LeetCode | MSSQL

#leetcode #mssql #tutorial #database

The Problem

Consider an Employees table with the following schema:

Column Name	Type
employee_id	int
name	varchar
reports_to	int
age	int

employee_id is the primary key for this table. This table contains information about the employees and the id of the manager they report to. Some employees do not report to anyone (reports_to is null). For this problem, a manager is defined as an employee who has at least one other employee reporting to them. The task is to write an SQL query to report the ids and the names of all managers, the number of employees who report directly to them, and the average age of the reports rounded to the nearest integer.

Explanation

Consider the following Employees table:

employee_id	name	reports_to	age
9	Hercy	null	43
6	Alice	9	41
4	Bob	9	36
2	Winston	null	37

The expected output would be:

employee_id	name	reports_count	average_age
9	Hercy	2	39

Here, Hercy has two people report directly to him, Alice and Bob. Their average age is (41+36)/2 = 38.5, which is 39 after rounding it to the nearest integer.

The Solution

We will discuss four different SQL approaches to solve this problem, explaining their main differences, strengths, weaknesses, and underlying structures.

Source Code 1: Join and Group By

This code uses a JOIN operation to combine rows from two or more tables based on a related column between them.



SELECT
    m.employee_id,
    m.name,
    COUNT(e.reports_to) [reports_count],
    ROUND(AVG(e.age * 1.0), 0) [average_age]
FROM Employees e JOIN Employees m ON e.reports_to = m.employee_id
GROUP BY
    m.employee_id,
    m.name
ORDER BY m.employee_id

The GROUP BY clause groups the employee_id and name from the m alias (managers), calculates the count of reports and the average age. The ROUND function is used to round the average age to the nearest integer. The result is then ordered by employee_id.

This query runtime is 1559ms, beating 33.60% of other submissions on LeetCode.

Source Code 2: Subquery, Join, and Group By

This code utilizes subqueries to first isolate the needed columns before performing the join operation, potentially improving performance.



SELECT
    m.employee_id,
    m.name,
    COUNT(e.reports_to) [reports_count],
    ROUND(AVG(e.age * 1.0), 0) [average_age]
FROM (
    SELECT
        reports_to,
        age
    FROM Employees
) e JOIN (
    SELECT
        employee_id,
        name
    FROM Employees
) m ON e.reports_to = m.employee_id
GROUP BY
    m.employee_id,
    m.name
ORDER BY m.employee_id

By separating the reports_to and age in subquery e, and employee_id and name in subquery m, the join operation may be faster. This query has a runtime of 1446ms, beating 46.6% of other submissions on LeetCode.

Source Code 3: Subquery, Join, Group By, and Calculated Average

This code also uses subqueries for the join operation, but it calculates the average differently by explicitly calculating the sum and dividing by the count.



SELECT
    m.employee_id,
    m.name,
    COUNT(e.reports_to) [reports_count],
    ROUND(SUM(e.age) * 1.0 / COUNT(e.reports_to), 0) [average_age]
FROM (
    SELECT
        reports_to,
        age
    FROM Employees
) e JOIN (
    SELECT
        employee_id,
        name
    FROM Employees
) m ON e.reports_to = m.employee_id
GROUP BY
    m.employee_id,
    m.name
ORDER BY m.employee_id

By manually calculating the average (sum of ages divided by count), the rounding to the nearest integer can be more precise. This query runtime is 1122ms, beating 93.38% of other submissions on LeetCode.

Source Code 4: Join, Group By, and Calculated Average

This code is similar to Source Code 3, but it does not use subqueries, reverting to a more direct join operation.



SELECT
    m.employee_id,
    m.name,
    COUNT(e.reports_to) [reports_count],
    ROUND(SUM(e.age) * 1.0 / COUNT(e.reports_to), 0) [average_age]
FROM Employees e JOIN Employees m ON e.reports_to = m.employee_id
GROUP BY
    m.employee_id,
    m.name
ORDER BY m.employee_id

This solution combines the direct join approach from Source Code 1 and the explicit average calculation from Source Code 3. This query runtime is 1207ms, beating 79.18% of other submissions on LeetCode.

Conclusion

From these solutions, we learn that the choice of SQL constructs and the way we express computations can impact performance significantly. Interestingly, using subqueries did not always lead to better performance, which emphasizes that SQL optimization can be a complex topic. The top-performing solution was Source Code 3, followed by Source Code 4, Source Code 2, and then Source Code 1. However, it's important to note that LeetCode's performance does not perfectly represent real-world database performance, as many other factors come into play, including indexing, database design, and data distribution.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

👉 Check out all the links on my beacons.ai page 👈

Top comments (1)

ohmydi • Oct 15 '23

Nowadays, there are many similar applications available, and you should choose one that has the advantages that you value. Here you can learn more about the list of the best monitoring programs kickidler.com/info/top-best-employ... with a description of pros and cons. For me personally, Kickidler is the leader, but who is number one for you?