DEV Community

Cover image for Trips and Users | LeetCode | MSSQL
Retiago Drago
Retiago Drago

Posted on

Trips and Users | LeetCode | MSSQL

The Problem

The challenge involves two tables: Trips and Users. The Trips table holds information about all taxi trips, including the unique trip id, client id, driver id, city id, status of the trip, and the date of the request. The Users table maintains records of all users, encompassing the unique user id, user's status (banned or not), and their role (client, driver, partner).

The task is to compute the cancellation rate each day between "2013-10-01" and "2013-10-03". The cancellation rate is defined as the ratio of the number of cancelled requests by unbanned users (both client and driver) to the total number of requests made by unbanned users that day. The cancellation rate should be expressed up to two decimal places. The tables' schemas are as follows:

Trips

Column Name Type
id int
client_id int
driver_id int
city_id int
status enum
request_at date

Users

Column Name Type
users_id int
banned enum
role enum

Explanation

Consider the following input:

Trips

id client_id driver_id city_id status request_at
1 1 10 1 completed 2013-10-01
2 2 11 1 cancelled_by_driver 2013-10-01
3 3 12 6 completed 2013-10-01
4 4 13 6 cancelled_by_client 2013-10-01
5 1 10 1 completed 2013-10-02
6 2 11 6 completed 2013-10-02
7 3 12 6 completed 2013-10-02
8 2 12 12 completed 2013-10-03
9 3 10 12 completed 2013-10-03
10 4 13 12 cancelled_by_driver 2013-10-03

Users

users_id banned role
1 No client
2 Yes client
3 No client
4 No client
10 No driver
11 No driver
12 No driver
13 No driver

For this data, the expected output is:

Day Cancellation Rate
2013-10-01 0.33
2013-10-02 0.00
2013-10-03 0.50

Explanation:

On 2013-10-01:

  • There were 4 requests in total, 2 of which were canceled.
  • The request with Id=2 was made by a banned client (User_Id=2), so it is excluded from the calculation.
  • Hence, considering unbanned users, there were 3 requests in total, 1 of which was canceled.
  • Therefore, the Cancellation Rate is (1 / 3) = 0.33

On 2013-10-02:

  • There were 3 requests in total, none of which were canceled.
  • The request with Id=6 was made by a banned client, so it is excluded from the calculation.
  • Thus, considering unbanned users, there were 2 requests in total, none of which were canceled.
  • Hence, the Cancellation Rate is (0 / 2) = 0.00

On 2013-10-03:

  • There were 3 requests in total, 1 of which was canceled.
  • The request with Id=8 was made by a banned client, so it is excluded from the calculation.
  • Hence, considering unbanned users, there were 2 requests in total, 1 of which was canceled.
  • Therefore, the Cancellation Rate is (1 / 2) = 0.50

The cancellation rate for each day is calculated considering only unbanned users (both client and driver) and requests that occurred on the day. If a request was made by a banned user, it is ignored in the calculation. The final cancellation rate is presented as a fraction of the total requests made by unbanned users that were canceled, rounded to two decimal places.

The Solution

Let's delve into three different approaches to solve this problem. As we progress from Source Code 1 to Source Code 3, the code gets more efficient and easier to read, even though all three of them provide the correct results.

Source Code 1

This SQL query makes use of a Common Table Expression (CTE) called 'cancellation_rates'. It generates a temporary result set by joining the Trips table with the Users table twice, once for the client and once for the driver. This query uses window functions COUNT(*) OVER() to get the number of unbanned requests and unbanned statuses for each day. It then calculates the cancellation rate in the main SELECT query.

This query might seem a bit complex due to the use of window functions and CTE. However, it provides a clear separation of calculating the unbanned request count and the actual calculation of cancellation rate.

The performance of this query is good but not the best, with a runtime of 572ms, beating 45.44% of other submissions.

WITH cancellation_rates AS (
    SELECT DISTINCT
        t.request_at AS Day,
        t.status,
        COUNT(*) OVER(PARTITION BY t.request_at, t.status) AS unbanned_stat,
        COUNT(*) OVER(PARTITION BY t.request_at) AS unbanned_req
    FROM Trips t JOIN Users c ON t.client_id = c.users_id JOIN Users d ON t.driver_id = d.users_id
    WHERE
        c.banned = 'No'
        AND d.banned = 'No'
        AND  t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
)
SELECT
    Day,
    CASE
        WHEN status = 'completed' AND unbanned_stat = unbanned_req THEN 0
        ELSE ROUND(SUM(unbanned_stat) OVER (PARTITION BY Day, unbanned_stat) * 1.0 / unbanned_req, 2)
    END AS 'Cancellation Rate'
FROM cancellation_rates
WHERE
    status LIKE 'cancelled%'
    OR (status = 'completed' AND unbanned_stat = unbanned_req)
Enter fullscreen mode Exit fullscreen mode

solution1

Source Code 2

This query simplifies the approach by removing the use of a CTE. Instead, it calculates the cancellation rate directly within the main SELECT statement using conditional aggregation. It uses a CASE statement inside the SUM() function to count the number of cancellations and then divides that by the total number of requests.

This approach is easier to understand but has a slightly worse performance, with a runtime of 741ms, beating 12.61% of other submissions.

SELECT 
    t.request_at AS Day,
    ISNULL(
        ROUND(
            CAST(
                SUM(
                    CASE 
                        WHEN t.status LIKE 'cancelled%' THEN 1
                        ELSE 0 
                    END
                ) AS FLOAT) / 
            CAST(
                COUNT(*) AS FLOAT),
            2), 
        0) AS 'Cancellation Rate'
FROM 
    Trips t 
JOIN 
    Users c ON t.client_id = c.users_id AND c.banned = 'No'
JOIN 
    Users d ON t.driver_id = d.users_id AND d.banned = 'No'
WHERE 
    t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
GROUP BY 
    t.request_at
Enter fullscreen mode Exit fullscreen mode

solution2

Source Code 3

The third query is even more simplified and performs better than the previous two queries. It removes the explicit JOIN operations, replacing them with subqueries in the WHERE clause to filter out banned users.

This not only improves readability but also enhances performance because the query doesn't need to create large intermediate join tables. Instead, it filters rows directly based on client_id and driver_id, which is more efficient.

This query has the best performance among the three, with a runtime of 435ms, beating 81.43% of other submissions.

SELECT 
    t.request_at AS Day,
    ISNULL(
        ROUND(
            SUM(
                CASE 
                    WHEN t.status LIKE 'cancelled%' THEN 1.0
                    ELSE 0.0 
                END
            ) / 
            COUNT(*),
            2), 
        0) AS 'Cancellation Rate'
FROM 
    Trips t 
WHERE 
    t.request_at BETWEEN '2013-10-01' AND '2013-10-03'
    AND t.client_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'client')
    AND t.driver_id IN (SELECT users_id FROM Users WHERE banned = 'No' AND role = 'driver')
GROUP BY 
    t.request_at
Enter fullscreen mode Exit fullscreen mode

solution3

Conclusion

From these solutions, we can learn that while different SQL queries can achieve the same result, their performance can vary based on how we leverage the SQL features and functions. Using joins and subqueries strategically can have a significant impact on the query execution speed.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

favicon beacons.ai

Top comments (0)