When I did my presentation at APEX Connect, Erik van Roon explained another pitfall of my MINUS approach to compare the content of two views to me – something some people already tried to show me and failed (well, I failed getting the point).
Let’s start with the same situation as in the last example, a small list of Star Wars characters and the movie episodes in which they appear:
ID | Character | Episodes |
---|---|---|
1 | Darth Vader | 3, 4, 5, 6 |
2 | Luke Skywalker | 4, 5, 6, 7, 8 |
3 | Rey | 7, 8 |
Now let’s assume we are forced to do overwork by the sithlord in charge, it’s 3 am, we are terribly tired and create a new view but miss the group by
statement:
create or replace view all_movie_characters_3
as
select
sw_char.id,
sw_char.name
from
star_wars_characters sw_char
inner join appearance_in_episode ep
on sw_char.id = ep.character_fk;
Or course this will not return the expected result:
ID | Character |
---|---|
1 | Darth Vader |
1 | Darth Vader |
1 | Darth Vader |
1 | Darth Vader |
2 | Luke Skywalker |
2 | Luke Skywalker |
2 | Luke Skywalker |
2 | Luke Skywalker |
3 | Rey |
3 | Rey |
But if we apply our MINUS comparison, we will get no results, indicating that both of the views are equal:
(
select * from all_movie_characters
minus
select * from all_movie_characters_3
)
union all
(
select * from all_movie_characters_3
minus
select * from all_movie_characters
);
-- No results
The problem is, that MINUS removes all appearances of equal rows. Therefore, it’s only useful when we can be sure that each row is exactly identifiable by its contents.
You could put it like this: MINUS comparison works if select * from ...
of a query is equal to select distinct * from ...
– which is not the case in our example:
select 'distinct' type, count(*)
from (
select distinct *
from all_movie_characters_3
)
union all
select 'all', count(*) from all_movie_characters_3;
TYPE | COUNT(*) |
---|---|
distinct | 3 |
all | 10 |
The solution Erik suggested, was to add the count over all columns to the comparison:
with old as (
select id, name, count(*) number_of_equals
from all_movie_characters
group by id, name
),
new as (
select id, name, count(*) number_of_equals
from all_movie_characters_3
group by id, name
)
(
select * from old
minus
select * from new
)
union all
(
select * from new
minus
select * from old
);
The disadvantage here is, that we can’t use *
notation anymore, something which I always found to be very convenient for quick comparisons, especially when comparing views with many columns. If you want to be sure, though, there’s no way to avoid investing the additional time.
Alternative approaches
There’s a pretty famous approach to comparing the contents of two queries by Tom Kyte:
select id, name, sum(old_cnt), sum(new_cnt)
from (
select id, name, 1 old_cnt, 0 new_cnt
from all_movie_characters source
union all
select id, name, 0 old_cnt, 1 new_cnt
from all_movie_characters_3 target
)
group by id, name
having sum(old_cnt) != sum(new_cnt);
ID | NAME | SUM(OLD_CNT) | SUM(NEW_CNT) |
---|---|---|---|
2 | Luke Skywalker | 1 | 4 |
3 | Rey | 1 | 2 |
1 | Darth Vader | 1 | 4 |
Besides being very elegant and pretty performant, one really cool thing about this approach is that you can easily see which rows are different and where the difference lies.
The downside for me is that it’s quite some amount of typing and that what’s happening is not so obvious for the untrained reader (like me). Both of these can be worked around, though.
Stewart Ashton wrote a lot about that topic and even came up with a helper package that makes it much more convenient to compare two query results. You should check out his list of blog posts.
The complete example is available on LiveSQL and github.
So, can the simple MINUS comparison still be useful?
I think it can, for quick comparisons of familiar data. However, it’s good to know the limitations and the alternatives. As a general rule I’d say: Better go with a solution that requires more effort but contains less pitfalls. But there are always exceptions to the rules.
Top comments (0)