Consider a table
Followers with the following schema:
(user_id, follower_id) is the primary key for this table. This table contains the IDs of a user and a follower in a social media app where the follower follows the user. The task is to write an SQL query that will, for each user, return the number of followers. The result table should be ordered by
user_id in ascending order.
Consider the following
For this input, the expected output would be:
Here, each row represents a distinct
user_id, and the count of followers for each user.
We will discuss two different approaches to solve this problem, highlight their strengths and weaknesses, and explain their underlying structures.
This approach directly counts the
follower_id for each
user_id using GROUP BY clause.
SELECT user_id, COUNT(follower_id) [followers_count] FROM Followers GROUP BY user_id
This query is fairly straightforward and easy to understand. It works by grouping the data by
user_id, and then it counts the followers within each group. The advantage of this method is its simplicity and directness. However, GROUP BY clause can be slow on large data sets. This query runtime is 994ms, beating 86.63% of other submissions on LeetCode.
This approach counts
follower_id for each
user_id using window function.
SELECT DISTINCT user_id, COUNT(follower_id) OVER(PARTITION BY user_id) [followers_count] FROM Followers
This query, while similar to the first, leverages the power of window functions to achieve the same result. The PARTITION BY clause essentially creates a "window" of rows with the same
user_id, and the COUNT function then operates on this window. This approach can sometimes offer performance improvements over the GROUP BY method, especially when dealing with larger data sets. However, in this case, the performance was slightly less, with a runtime of 1416ms, beating 24.50% of other submissions on LeetCode.
Both methods solve the problem, but their performances can vary based on the size and distribution of the data set. While the first solution using GROUP BY clause performed better on LeetCode, in a real-world scenario with a larger data set and a well-tuned database, the second method using window function might perform better. This example serves to illustrate the importance of understanding different methods and when each can be advantageous. Performance in LeetCode and real-world applications may not align perfectly, but practicing with various techniques helps us better prepare for different scenarios.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.