Average Time of Process per Machine | LeetCode | MSSQL

#leetcode #learning #tutorial #mssql

The Problem

Table: Activity

Column Name	Type
machine_id	int
process_id	int
activity_type	enum
timestamp	float

The table shows the user activities for a factory website. (machine_id, process_id, activity_type) is the primary key of this table. machine_id is the ID of a machine. process_id is the ID of a process running on the machine with ID machine_id. activity_type is an ENUM of type ('start', 'end'). timestamp is a float representing the current time in seconds. 'start' means the machine starts the process at the given timestamp and 'end' means the machine ends the process at the given timestamp. The 'start' timestamp will always be before the 'end' timestamp for every (machine_id, process_id) pair.

Problem: Write an SQL query to find the average time each machine takes to complete a process. The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run. The resulting table should have the machine_id along with the average time as processing_time, which should be rounded to 3 decimal places. Return the result table in any order.

Explanation

Input:
Activity table:

machine_id	process_id	activity_type	timestamp
0	0	start	0.712
0	0	end	1.520
0	1	start	3.140
0	1	end	4.120
1	0	start	0.550
1	0	end	1.550
1	1	start	0.430
1	1	end	1.420
2	0	start	4.100
2	0	end	4.512
2	1	start	2.500
2	1	end	5.000

Output:

machine_id	processing_time
0	0.894
1	0.995
2	1.456

Explanation:
There are 3 machines running 2 processes each. Machine 0's average time is ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894. Machine 1's average time is ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995. Machine 2's average time is ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456.

The Solution

This post presents four distinct solutions to the problem. These solutions vary in their approach, performance, and complexity. Let's explore each one and highlight their strengths, weaknesses, and main structural differences.

Source Code 1

The first solution uses a simple SELECT statement with a CASE WHEN clause to calculate the sum of 'end' and 'start' timestamps for each machine. It then divides this sum by the count of distinct process_ids to find the average processing time.



SELECT
    machine_id,
    ROUND(
        SUM(
            CASE
                WHEN activity_type = 'start' THEN -timestamp
                ELSE timestamp
            END
        ) / COUNT(DISTINCT process_id)    
    , 3) [processing_time]
FROM Activity
GROUP BY machine_id

The runtime of this code is 824ms, beating 11.26% of submissions.

Source Code 2

The second solution also uses a SELECT statement, but replaces the CASE WHEN clause with the IIF function, which simplifies the code slightly. The rest of the code is identical to Source Code 1.



SELECT
    machine_id,
    ROUND(SUM(IIF(activity_type = 'start', -timestamp, timestamp)) / COUNT(DISTINCT process_id), 3) [processing_time]
FROM Activity
GROUP BY machine_id

The runtime of this code is 631ms, beating 27.1% of submissions.

Source Code 3

The third solution differs significantly from the first two. It uses a subquery to calculate the sum of 'start' and 'end' timestamps for each machine. The main SELECT statement then subtracts the total 'start' timestamp from the total 'end' timestamp and divides the result by the count of distinct process_ids.



SELECT
    r.machine_id,
    ROUND((r.total_end-r.total_start) / (SELECT COUNT(DISTINCT process_id) FROM Activity), 3) [processing_time]
FROM (
    SELECT
        machine_id,
        SUM(IIF(activity_type = 'start', timestamp, 0)) [total_start],
        SUM(IIF(activity_type = 'end', timestamp, 0)) [total_end]
    FROM Activity
    GROUP BY machine_id
) r

The runtime of this code is 517ms, beating 46.61% of submissions.

Source Code 4

The final solution uses two CTEs (Common Table Expressions) to calculate the total time taken by each machine and the total number of processes. The main SELECT statement then calculates the average processing time as in Source Code 3.



WITH processes AS (
    SELECT COUNT(DISTINCT process_id) [total_process]
    FROM Activity
), total_time AS (
    SELECT
        machine_id,
        SUM(IIF(activity_type = 'start', timestamp, 0)) [total_start],
        SUM(IIF(activity_type = 'end', timestamp, 0)) [total_end]
    FROM Activity
    GROUP BY machine_id
)
SELECT
    machine_id,
    ROUND((total_end-total_start) / (SELECT total_process FROM processes), 3) [processing_time]
FROM total_time

The runtime of this code is 791ms, beating 12.81% of submissions.

Conclusion

The third solution (Source Code 3) demonstrates the best overall performance on LeetCode. However, it's worth noting that performance can vary significantly depending on the specific characteristics and configurations of the real-world RDMS. Therefore, choosing the best solution for a particular use case would require a comprehensive understanding of the data structure, data volume, and the performance of different SQL constructs within that specific environment.

Here's a ranking of these solutions based on LeetCode performance: