The Problem
The task at hand is to analyze two tables: Visits
and Transactions
.
Visits
contains information about customers who visited a mall. The structure of the Visits
table is as follows:
Column Name | Type |
---|---|
visit_id | int |
customer_id | int |
visit_id
is the primary key for this table.
Transactions
contains information about transactions made during a particular visit. The structure of the Transactions
table is:
Column Name | Type |
---|---|
transaction_id | int |
visit_id | int |
amount | int |
transaction_id
is the primary key for this table.
The objective is to write a SQL query to find the customer IDs of users who visited the mall without making any transactions, as well as the number of times they made such visits. The result should be returned in any order.
For instance, consider the following input:
Visits
visit_id | customer_id |
---|---|
1 | 23 |
2 | 9 |
4 | 30 |
5 | 54 |
6 | 96 |
7 | 54 |
8 | 54 |
Transactions
transaction_id | visit_id | amount |
---|---|---|
2 | 5 | 310 |
3 | 5 | 300 |
9 | 5 | 200 |
12 | 1 | 910 |
13 | 2 | 970 |
The expected output is:
customer_id | count_no_trans |
---|---|
54 | 2 |
30 | 1 |
96 | 1 |
The explanation for this output is as follows:
- Customer 23 visited the mall once and made one transaction during visit 1.
- Customer 9 visited the mall once and made one transaction during visit 2.
- Customer 30 visited the mall once but did not make any transactions.
- Customer 54 visited the mall three times. During two of these visits, they did not make any transactions, but during one visit, they made three transactions.
- Customer 96 visited the mall once but did not make any transactions.
Hence, customers 30 and 96 visited the mall once without making any transactions. Customer 54 visited twice without making any transactions.
The Solution
We have six source codes to address this problem. All six solutions use different SQL join types and methods for filtering records. The solutions use SQL functions such as LEFT JOIN, RIGHT JOIN, DISTINCT, COUNT, GROUP BY, and NOT EXISTS in different combinations. The different strategies employed by these codes include joining tables and filtering the result to only include visits without transactions, using the NOT IN clause to exclude visits that have corresponding transactions, and using the NOT EXISTS clause to exclude visits that have corresponding transactions.
Source Code 1
This code uses a LEFT JOIN to combine the Visits
and Transactions
tables and then filters the result to only include visits that don't have a corresponding transaction. The COUNT
function is then used to find the number of such visits for each customer.
SELECT
v.customer_id,
COUNT(v.visit_id) [count_no_trans]
FROM Visits v LEFT JOIN Transactions t ON v.visit_id = t.visit_id
WHERE t.visit_id IS NULL
GROUP BY v.customer_id
This code has a runtime of 2934ms, beating 13.82% of submissions.
Source Code 2
This code is similar to Source Code 1, but it uses a RIGHT JOIN instead of a LEFT JOIN.
SELECT
v.customer_id,
COUNT(v.visit_id) [count_no_trans]
FROM Transactions t RIGHT JOIN Visits v ON v.visit_id = t.visit_id
WHERE t.visit_id IS NULL
GROUP BY v.customer_id
This code has a runtime of 2931ms, also beating 13.82% of submissions.
Source Code 3
Source Code 3 uses a subquery in the WHERE clause to filter out visits that have corresponding transactions. The NOT IN
clause excludes visit IDs that are in the Transactions
table.
SELECT
customer_id,
COUNT(visit_id) [count_no_trans]
FROM Visits
WHERE visit_id NOT IN (
SELECT DISTINCT visit_id
FROM Transactions
)
GROUP BY customer_id
This code has a runtime of 2848ms, beating 16.79% of submissions.
Source Code 4
This code is similar to Source Code 3, but it also uses the DISTINCT
keyword in the main query, not just in the subquery. Additionally, it uses the OVER
clause with the PARTITION BY
clause to compute the count of visits for each customer.
SELECT DISTINCT
customer_id,
COUNT(visit_id) OVER(PARTITION BY customer_id) [count_no_trans]
FROM Visits
WHERE visit_id NOT IN (
SELECT DISTINCT visit_id
FROM Transactions
)
This code has a runtime of 2444ms, beating 39.85% of submissions.
Source Code 5
Source Code 5 uses the NOT EXISTS
clause with a subquery to exclude visits that have corresponding transactions. It also uses the OVER
clause with the PARTITION BY
clause to compute the count of visits for each customer.
SELECT DISTINCT
customer_id,
COUNT(visit_id) OVER(PARTITION BY customer_id) [count_no_trans]
FROM Visits v
WHERE NOT EXISTS (
SELECT 1
FROM Transactions t
WHERE t.visit_id = v.visit_id
)
This code has a runtime of 2464ms, beating 38.63% of submissions.
Source Code 6
Source Code 6 is similar to Source Code 5, but it uses the GROUP BY
clause instead of the OVER
clause with the PARTITION BY
clause to compute the count of visits for each customer.
SELECT
customer_id,
COUNT(visit_id) [count_no_trans]
FROM Visits v
WHERE NOT EXISTS (
SELECT 1
FROM Transactions t
WHERE t.visit_id = v.visit_id
)
GROUP BY customer_id
This code has a runtime of 3669ms, beating 5.2% of submissions.
Conclusion
These solutions demonstrate different ways to solve the problem using SQL. Some methods are more efficient than others in terms of runtime. Based on these examples, it appears that using NOT EXISTS
with a subquery, and using the OVER
clause with the PARTITION BY
clause for counting, provides the most efficient solution. However, the efficiency of a solution can depend on the specifics of the database management system being used, as well as the data distribution. Hence, different solutions may be more efficient in different circumstances.
Ranking the solutions from best to worst in terms of LeetCode performance, we get:
- Source Code 4
- Source Code 5
- Source Code 3
- Source Code 1 and 2
- Source Code 6
Keep in mind that performance on LeetCode might not always translate directly to performance in a real-world RDBMS. LeetCode measures how quickly a solution runs in its specific environment, which may differ from the environment of a production database. Factors like indexing, query optimization by the database engine, the hardware the database is running on, and more can all affect the real-world performance of a solution.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.
Top comments (0)