Recently I wondered whether the usage of different equality predicates ('=' and 'in') in a query would differently costed, and thus could lead to different plans.
Example table:
create table eq ( id text primary key, f1 text );
create index eq_f1_i on eq(f1);
Fill table with arbitrary data:
insert into eq
select id::text, id::text from generate_series(1,10000) id;
Now let's see what '=' predicate does:
yugabyte=# explain select * from eq where id = '42';
QUERY PLAN
-------------------------------------------------------------------
Index Scan using eq_pkey on eq (cost=0.00..4.11 rows=1 width=64)
Index Cond: (id = '42'::text)
And let's see what the 'in' predicate does:
yugabyte=# explain select * from eq where id in ('42');
QUERY PLAN
-------------------------------------------------------------------
Index Scan using eq_pkey on eq (cost=0.00..4.11 rows=1 width=64)
Index Cond: (id = '42'::text)
The cost is identical, and actually the condition for the usage of the index is transformed to be identical in both cases (for both '=' and 'in', the condition becomes '='). There is no difference here.
I cannot see any reason how that could be, but maybe there's a difference when it's not on a primary key, and thus on YugabyteDB needs a 'secondary index'?
'=' predicate:
explain select * from eq where f1 = '42';
QUERY PLAN
--------------------------------------------------------------------
Index Scan using eq_f1_i on eq (cost=0.00..5.22 rows=10 width=64)
Index Cond: (f1 = '42'::text)
'in' predicate:
explain select * from eq where f1 in ('42');
QUERY PLAN
--------------------------------------------------------------------
Index Scan using eq_f1_i on eq (cost=0.00..5.22 rows=10 width=64)
Index Cond: (f1 = '42'::text)
There is no difference between using the '=' or the 'in' predicate for equality. Because SQL is a declarative language, the predicate is "translated" to what it's supposed to be doing, and costed. And besides being executed the same, it's also costed the same, so switching between '=' and 'in' will not flip plans.
The tests were conducted on a YugabyteDB version 2.13.2.0b135 database. Testing on PostgreSQL version 13 also shows completely identical cost for both equality predicates.
Top comments (2)
DB should parse your SQL into a valid execution plan.
Advanced RDBMS should use CBO and stats for that. Depending on the RDBMS used this could lead to exactly same execution path or something different, especially when you take partitioning into account. You should know that data histogram could impact the cost of every plan as well as the estimated amount of rows returned.
Please, be extremely careful about the partitioning and long list of arguments (ex. Oracle prior to 10.0.5.2.1 had problems with "in" having too many values).
Your empirical test proved that the execution plan was the same in your case but it doesn't mean that it would be always the case (even if you could point in the DB sources that currently this has exactly same implementation).
Anyway - good article to show that "in() should be the same as ="
You raise an important point: in case of doubt, test and validate.
Different databases might evaluate differently, and different/new versions might change things, and lastly limits might be hit causing changes.