The challenge involves two tables:
Trips table holds information about all taxi trips, including the unique trip id, client id, driver id, city id, status of the trip, and the date of the request. The
Users table maintains records of all users, encompassing the unique user id, user's status (banned or not), and their role (client, driver, partner).
The task is to compute the cancellation rate each day between "2013-10-01" and "2013-10-03". The cancellation rate is defined as the ratio of the number of cancelled requests by unbanned users (both client and driver) to the total number of requests made by unbanned users that day. The cancellation rate should be expressed up to two decimal places. The tables' schemas are as follows:
Consider the following input:
For this data, the expected output is:
- There were 4 requests in total, 2 of which were canceled.
- The request with Id=2 was made by a banned client (User_Id=2), so it is excluded from the calculation.
- Hence, considering unbanned users, there were 3 requests in total, 1 of which was canceled.
- Therefore, the Cancellation Rate is (1 / 3) = 0.33
- There were 3 requests in total, none of which were canceled.
- The request with Id=6 was made by a banned client, so it is excluded from the calculation.
- Thus, considering unbanned users, there were 2 requests in total, none of which were canceled.
- Hence, the Cancellation Rate is (0 / 2) = 0.00
- There were 3 requests in total, 1 of which was canceled.
- The request with Id=8 was made by a banned client, so it is excluded from the calculation.
- Hence, considering unbanned users, there were 2 requests in total, 1 of which was canceled.
- Therefore, the Cancellation Rate is (1 / 2) = 0.50
The cancellation rate for each day is calculated considering only unbanned users (both client and driver) and requests that occurred on the day. If a request was made by a banned user, it is ignored in the calculation. The final cancellation rate is presented as a fraction of the total requests made by unbanned users that were canceled, rounded to two decimal places.
Let's delve into three different approaches to solve this problem. As we progress from Source Code 1 to Source Code 3, the code gets more efficient and easier to read, even though all three of them provide the correct results.
This SQL query makes use of a Common Table Expression (CTE) called 'cancellation_rates'. It generates a temporary result set by joining the
Trips table with the
Users table twice, once for the client and once for the driver. This query uses window functions
COUNT(*) OVER() to get the number of unbanned requests and unbanned statuses for each day. It then calculates the cancellation rate in the main
This query might seem a bit complex due to the use of window functions and CTE. However, it provides a clear separation of calculating the unbanned request count and the actual calculation of cancellation rate.
The performance of this query is good but not the best, with a runtime of 572ms, beating 45.44% of other submissions.
WITH cancellation_rates AS ( SELECT DISTINCT t.request_at AS Day, t.status, COUNT(*) OVER(PARTITION BY t.request_at, t.status) AS unbanned_stat, COUNT(*) OVER(PARTITION BY t.request_at) AS unbanned_req FROM Trips t JOIN Users c ON t.client_id = c.users_id JOIN Users d ON t.driver_id = d.users_id WHERE c.banned = 'No' AND d.banned = 'No' AND t.request_at BETWEEN '2013-10-01' AND '2013-10-03' ) SELECT Day, CASE WHEN status = 'completed' AND unbanned_stat = unbanned_req THEN 0 ELSE ROUND(SUM(unbanned_stat) OVER (PARTITION BY Day, unbanned_stat) * 1.0 / unbanned_req, 2) END AS 'Cancellation Rate' FROM cancellation_rates WHERE status LIKE 'cancelled%' OR (status = 'completed' AND unbanned_stat = unbanned_req)
This query simplifies the approach by removing the use of a CTE. Instead, it calculates the cancellation rate directly within the main
SELECT statement using conditional aggregation. It uses a
CASE statement inside the
SUM() function to count the number of cancellations and then divides that by the total number of requests.
This approach is easier to understand but has a slightly worse performance, with a runtime of 741ms, beating 12.61% of other submissions.
SELECT t.request_at AS Day, ISNULL( ROUND( CAST( SUM( CASE WHEN t.status LIKE 'cancelled%' THEN 1 ELSE 0 END ) AS FLOAT) / CAST( COUNT(*) AS FLOAT), 2), 0) AS 'Cancellation Rate' FROM Trips t JOIN Users c ON t.client_id = c.users_id AND c.banned = 'No' JOIN Users d ON t.driver_id = d.users_id AND d.banned = 'No' WHERE t.request_at BETWEEN '2013-10-01' AND '2013-10-03' GROUP BY t.request_at
The third query is even more simplified and performs better than the previous two queries. It removes the explicit
JOIN operations, replacing them with subqueries in the
WHERE clause to filter out banned users.
This not only improves readability but also enhances performance because the query doesn't need to create large intermediate join tables. Instead, it filters rows directly based on client_id and driver_id, which is more efficient.
This query has the best performance among the three, with a runtime of 435ms, beating 81.43% of other submissions.
SELECT t.request_at AS Day, ISNULL( ROUND( SUM( CASE WHEN t.status LIKE 'cancelled%' THEN 1.0 ELSE 0.0 END ) / COUNT(*), 2), 0) AS 'Cancellation Rate' FROM Trips t WHERE t.request_at BETWEEN '2013-10-01' AND '2013-10-03' AND t.client_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'client') AND t.driver_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'driver') GROUP BY t.request_at
From these solutions, we can learn that while different SQL queries can achieve the same result, their performance can vary based on how we leverage the SQL features and functions. Using joins and subqueries strategically can have a significant impact on the query execution speed.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.