When I did my presentation at APEX Connect, Erik van Roon explained another pitfall of my MINUS approach to compare the content of two views to me – something some people already tried to show me and failed (well, I failed getting the point).
Let’s start with the same situation as in the last example, a small list of Star Wars characters and the movie episodes in which they appear:
|1||Darth Vader||3, 4, 5, 6|
|2||Luke Skywalker||4, 5, 6, 7, 8|
Now let’s assume we are forced to do overwork by the sithlord in charge, it’s 3 am, we are terribly tired and create a new view but miss the
group by statement:
create or replace view all_movie_characters_3 as select sw_char.id, sw_char.name from star_wars_characters sw_char inner join appearance_in_episode ep on sw_char.id = ep.character_fk;
Or course this will not return the expected result:
But if we apply our MINUS comparison, we will get no results, indicating that both of the views are equal:
( select * from all_movie_characters minus select * from all_movie_characters_3 ) union all ( select * from all_movie_characters_3 minus select * from all_movie_characters ); -- No results
The problem is, that MINUS removes all appearances of equal rows. Therefore, it’s only useful when we can be sure that each row is exactly identifiable by its contents.
You could put it like this: MINUS comparison works if
select * from ... of a query is equal to
select distinct * from ... – which is not the case in our example:
select 'distinct' type, count(*) from ( select distinct * from all_movie_characters_3 ) union all select 'all', count(*) from all_movie_characters_3;
The solution Erik suggested, was to add the count over all columns to the comparison:
with old as ( select id, name, count(*) number_of_equals from all_movie_characters group by id, name ), new as ( select id, name, count(*) number_of_equals from all_movie_characters_3 group by id, name ) ( select * from old minus select * from new ) union all ( select * from new minus select * from old );
The disadvantage here is, that we can’t use
* notation anymore, something which I always found to be very convenient for quick comparisons, especially when comparing views with many columns. If you want to be sure, though, there’s no way to avoid investing the additional time.
There’s a pretty famous approach to comparing the contents of two queries by Tom Kyte:
select id, name, sum(old_cnt), sum(new_cnt) from ( select id, name, 1 old_cnt, 0 new_cnt from all_movie_characters source union all select id, name, 0 old_cnt, 1 new_cnt from all_movie_characters_3 target ) group by id, name having sum(old_cnt) != sum(new_cnt);
Besides being very elegant and pretty performant, one really cool thing about this approach is that you can easily see which rows are different and where the difference lies.
The downside for me is that it’s quite some amount of typing and that what’s happening is not so obvious for the untrained reader (like me). Both of these can be worked around, though.
I think it can, for quick comparisons of familiar data. However, it’s good to know the limitations and the alternatives. As a general rule I’d say: Better go with a solution that requires more effort but contains less pitfalls. But there are always exceptions to the rules.