The Problem
The challenge involves the table World
.
World
table:
name (PK) | continent | area | population | gdp |
---|---|---|---|---|
varchar | varchar | int | int | bigint |
Each row of this table provides information about a country, including its continent, area, population, and GDP. We define a country as "big" if it has an area of at least 3 million km2 or a population of at least 25 million.
The task is to write an SQL query that reports the name, population, and area of the "big" countries.
Explanation
Here's an example for better understanding:
Input:
World
table:
name | continent | area | population | gdp |
---|---|---|---|---|
Afghanistan | Asia | 652230 | 25500100 | 20343000000 |
Albania | Europe | 28748 | 2831741 | 12960000000 |
Algeria | Africa | 2381741 | 37100000 | 188681000000 |
Andorra | Europe | 468 | 78115 | 3712000000 |
Angola | Africa | 1246700 | 20609294 | 100990000000 |
Output:
name | population | area |
---|---|---|
Afghanistan | 25500100 | 652230 |
Algeria | 37100000 | 2381741 |
Afghanistan and Algeria are considered "big" countries as their population or area exceeds the defined thresholds.
The Solution
Let's explore two SQL solutions to this problem, examining their differences, strengths, and weaknesses.
Source Code 1
The first solution retrieves countries from the table where the area is at least 3 million or the population is at least 25 million.
SELECT
name,
population,
area
FROM
World
WHERE
area >= 3000000
OR
population >= 25000000
This query achieves a runtime of 904ms, outperforming 92.98% of other submissions.
Source Code 2
The second solution is very similar to the first, but it first checks for the population condition before the area condition.
SELECT
name,
population,
area
FROM
World
WHERE
population >= 25000000
OR
area >= 3000000
This query has a runtime of 983ms, beating 78.89% of other submissions.
Conclusion
Both solutions yield the same output but have minor performance differences due to the order of conditions in the WHERE clause. Thus, the ranking of solutions based on overall performance, from best to worst, is: Source Code 1 > Source Code 2.
You should choose the solution that best fits your specific performance requirements and expectations.
You can find the original problem at LeetCode.
For more insightful solutions and tech-related content, connect with me on my Beacons page.
Top comments (3)
Just curious... Why doest it show a performance difference? I've just checked it in SQL Server 2014. I do not see any difference there. And it makes sense because we have only one key - Primary, but we do not use it in the WHERE clause. Therefore we get Clustered Index Scan for both SELECT statements.
Hi there, you've made an excellent point. In a real-world RDBMS, such as SQL Server, the order of conditions in the WHERE clause generally shouldn't affect the performance. The query optimizer will evaluate the conditions and decide the best execution plan based on the data statistics it maintains, regardless of the order we write these conditions in.
The difference in runtime we're observing here on LeetCode is likely due to a variety of factors unrelated to the database itself, like the current server load or other infrastructure aspects. Minor fluctuations in runtime can occur even between logically equivalent queries.
I agree with your point that the time reported by LeetCode might not be an accurate reflection of the actual performance of a query in a production database. The differences we're seeing between the two queries are probably more about LeetCode's platform than the inherent performance characteristics of SQL Server or other RDBMSs.
Thank you for bringing up this insightful point. It's a great reminder that while practicing on platforms like LeetCode is valuable, we also have to be aware of the differences between these platforms and real-world databases.
Hi, thanks for the clarification! It would be great if LeetCode could show an execution plan for further analysis.