The Problem
Table: Activity
Column Name | Type |
---|---|
machine_id | int |
process_id | int |
activity_type | enum |
timestamp | float |
The table shows the user activities for a factory website. (machine_id, process_id, activity_type)
is the primary key of this table. machine_id
is the ID of a machine. process_id
is the ID of a process running on the machine with ID machine_id
. activity_type
is an ENUM of type ('start', 'end'). timestamp
is a float representing the current time in seconds. 'start' means the machine starts the process at the given timestamp and 'end' means the machine ends the process at the given timestamp. The 'start' timestamp will always be before the 'end' timestamp for every (machine_id, process_id)
pair.
Problem: Write an SQL query to find the average time each machine takes to complete a process. The time to complete a process is the 'end' timestamp minus the 'start' timestamp. The average time is calculated by the total time to complete every process on the machine divided by the number of processes that were run. The resulting table should have the machine_id
along with the average time as processing_time
, which should be rounded to 3 decimal places. Return the result table in any order.
Explanation
Input:
Activity table:
machine_id | process_id | activity_type | timestamp |
---|---|---|---|
0 | 0 | start | 0.712 |
0 | 0 | end | 1.520 |
0 | 1 | start | 3.140 |
0 | 1 | end | 4.120 |
1 | 0 | start | 0.550 |
1 | 0 | end | 1.550 |
1 | 1 | start | 0.430 |
1 | 1 | end | 1.420 |
2 | 0 | start | 4.100 |
2 | 0 | end | 4.512 |
2 | 1 | start | 2.500 |
2 | 1 | end | 5.000 |
Output:
machine_id | processing_time |
---|---|
0 | 0.894 |
1 | 0.995 |
2 | 1.456 |
Explanation:
There are 3 machines running 2 processes each. Machine 0's average time is ((1.520 - 0.712) + (4.120 - 3.140)) / 2 = 0.894. Machine 1's average time is ((1.550 - 0.550) + (1.420 - 0.430)) / 2 = 0.995. Machine 2's average time is ((4.512 - 4.100) + (5.000 - 2.500)) / 2 = 1.456.
The Solution
This post presents four distinct solutions to the problem. These solutions vary in their approach, performance, and complexity. Let's explore each one and highlight their strengths, weaknesses, and main structural differences.
Source Code 1
The first solution uses a simple SELECT statement with a CASE WHEN clause to calculate the sum of 'end' and 'start' timestamps for each machine. It then divides this sum by the count of distinct process_ids to find the average processing time.
SELECT
machine_id,
ROUND(
SUM(
CASE
WHEN activity_type = 'start' THEN -timestamp
ELSE timestamp
END
) / COUNT(DISTINCT process_id)
, 3) [processing_time]
FROM Activity
GROUP BY machine_id
The runtime of this code is 824ms, beating 11.26% of submissions.
Source Code 2
The second solution also uses a SELECT statement, but replaces the CASE WHEN clause with the IIF function, which simplifies the code slightly. The rest of the code is identical to Source Code 1.
SELECT
machine_id,
ROUND(SUM(IIF(activity_type = 'start', -timestamp, timestamp)) / COUNT(DISTINCT process_id), 3) [processing_time]
FROM Activity
GROUP BY machine_id
The runtime of this code is 631ms, beating 27.1% of submissions.
Source Code 3
The third solution differs significantly from the first two. It uses a subquery to calculate the sum of 'start' and 'end' timestamps for each machine. The main SELECT statement then subtracts the total 'start' timestamp from the total 'end' timestamp and divides the result by the count of distinct process_ids.
SELECT
r.machine_id,
ROUND((r.total_end-r.total_start) / (SELECT COUNT(DISTINCT process_id) FROM Activity), 3) [processing_time]
FROM (
SELECT
machine_id,
SUM(IIF(activity_type = 'start', timestamp, 0)) [total_start],
SUM(IIF(activity_type = 'end', timestamp, 0)) [total_end]
FROM Activity
GROUP BY machine_id
) r
The runtime of this code is 517ms, beating 46.61% of submissions.
Source Code 4
The final solution uses two CTEs (Common Table Expressions) to calculate the total time taken by each machine and the total number of processes. The main SELECT statement then calculates the average processing time as in Source Code 3.
WITH processes AS (
SELECT COUNT(DISTINCT process_id) [total_process]
FROM Activity
), total_time AS (
SELECT
machine_id,
SUM(IIF(activity_type = 'start', timestamp, 0)) [total_start],
SUM(IIF(activity_type = 'end', timestamp, 0)) [total_end]
FROM Activity
GROUP BY machine_id
)
SELECT
machine_id,
ROUND((total_end-total_start) / (SELECT total_process FROM processes), 3) [processing_time]
FROM total_time
The runtime of this code is 791ms, beating 12.81% of submissions.
Conclusion
The third solution (Source Code 3) demonstrates the best overall performance on LeetCode. However, it's worth noting that performance can vary significantly depending on the specific characteristics and configurations of the real-world RDMS. Therefore, choosing the best solution for a particular use case would require a comprehensive understanding of the data structure, data volume, and the performance of different SQL constructs within that specific environment.
Here's a ranking of these solutions based on LeetCode performance:
- Source Code 3
- Source Code 2
- Source Code 4
- Source Code 1
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)