Pitfalls of the MINUS-comparison

#100codeexamples #sql #testing #database

My work includes a lot of database views because they work as a kind of public API in our case. Since I started talking about self-testing and utPLSQL, the MINUS-comparison has always had its place when assuring that a new, reconstructed view has the exact same content as the old one.

However, I did it wrong until recently when I discovered a major pitfall.

For I like to tell stories let’s start with a Star Wars scenario again: We have a table which contains all the Star Wars characters and another table which holds information in which of the “Episode”-movies they appeared.

We have the following sample data:

ID	Character	Episodes
1	Darth Vader	3, 4, 5, 6
2	Luke Skywalker	4, 5, 6, 7, 8
3	Rey	7, 8

Our current view is this:

create or replace view all_movie_characters
  as
  select
    sw_char.id,
    sw_char.name
  from
    star_wars_characters sw_char
    inner join appearance_in_episode ep
      on sw_char.id = ep.character_fk
  group by sw_char.id, sw_char.name;

However we just found a neat little function called appears_in_movie(i_character_id) – it looks like we could make our view more readable by using it (don’t start an argument about performance, this is just a made-up example!):

create or replace view all_movie_character_2
  as
  select
    sw_char.id,
    sw_char.name
  from
    star_wars_characters sw_char
  where appears_in_movie(sw_char.id) = 1;

Because we are very careful we didn’t just replace our initial view but gave it a new (very creative) name so we can compare both views and make sure they have the same content:

select * from all_movie_characters
minus
select * from all_movie_character_2
union all
select * from all_movie_character_2
minus
select * from all_movie_characters;

0 Rows returned. Great! We can ship this neat little improvement to production with confidence!

Never trust yourself. Let’s look at the appears_in_movie-function:

create or replace function appears_in_movie(
    i_character_id integer
  ) return integer result_cache
  as
    l_count integer;
  begin
    select count(*)
      into l_count
      from appearance_in_episode
      where character_fk = i_character_id
        and episode_no between 4 and 6;
    if l_count > 0 then
      return 1;
    else
      return 2;
    end if;
  end;

Ooops. Seems like the person who created this function had a different context and did only care about the characters who appear in the original 3 episodes 4 to 6! (I want to claim hereby that I completely endorse the episodes 7+ so far)

Results of selecting the new view:

| 1 | Darth Vader |
| 2 | Luke Skywalker |

But what happened? Why didn’t we notice that Rey is completely missing in the new view?

The problem is that UNION ALL and MINUS have the same precedence, which means that they are applied in the order of occurence. Here’s what happened:

  Darth Vader
  Luke Skywalker
  Rey
MINUS
  Darth Vader
  Luke Skywalker
----------------------
  Rey
UNION ALL
  Darth Vader
  Luke Skywalker
---------------------
  Rey
  Darth Vader
  Luke Skywalker
MINUS
  Darth Vader
  Luke Skywalker
  Rey
--------------------
0 Rows.

To avoid this, we need to wrap the two MINUS comparisons in parenthesis:

(
  select * from all_movie_characters
  minus
  select * from all_movie_character_2
)
union all
(
  select * from all_movie_character_2
  minus
  select * from all_movie_characters
);

1 Row returned: 3 Rey

You can find the whole example on LiveSQL and the commented version on my github repository.

Is this a #100CodeExample?

Yes, the version on github is a completely commented, runnable example, as the previous ones.

But I want to experiment a bit with making the examples on the blog more enjoyable and approachable by adding some story and only highlighting the crucial parts.

DEV Community

Pitfalls of the MINUS-comparison

Is this a #100CodeExample?

Top comments (0)