DEV Community

Tomas@PawSQL
Tomas@PawSQL

Posted on

Join Elimination - Advanced SQL tuning

Channel of advanced SQL tuning

Definition

Join Elimination is a rewriting optimization in SQL that simplifies queries and improves query performance by removing joins from the query without affecting the final result. Typically, this optimization is used when a query contains a primary-foreign key join and only references the primary key columns of the main table.

Consider the following example:

select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey
Enter fullscreen mode Exit fullscreen mode

The orders table is joined with the customer table, and c_custkey is the primary key of the customer table. In this case, the customer table can be eliminated, and the rewritten SQL would be:

select * from orders
Enter fullscreen mode Exit fullscreen mode

Types of Join Elimination

  • Inner Join Elimination
  • Outer Join Elimination

Inner Join Elimination

Conditions for Inner join elimination:

  • Fact-based primary-foreign key equality join (parent table's join column is non-null and unique)
  • The parent table's primary key is the only column of the parent table referenced in the query.

Inner join elimination works in the following way:

  • The parent table and the primary-foreign key join condition are eliminated.
  • Other references to the parent table's primary key are replaced with the foreign key of the external table.
  • If the foreign key can be null and there are no other NFC[^1] conditions, a new non-null foreign key condition needs to be added.

Example:

  • Original SQL:
select c_custkey from customer,orders where c_custkey=o_custkey
Enter fullscreen mode Exit fullscreen mode
  • Rewritten SQL:
  select orders.o_custkey from orders where orders.o_custkey is not null
Enter fullscreen mode Exit fullscreen mode

Outer Join Elimination

Conditions for outer join elimination:

  • The outer join to be eliminated must be a left or right outer join.
  • The join condition must have a primary-foreign key equality join connected by AND.
  • The primary key of the inner table (non-null and unique) is the only column of the inner table referenced in the query.

Outer join elimination works in the following way:

  • The inner table and all join conditions are eliminated.
  • Other references to the inner table's primary key are replaced with the outer table's foreign key.

Example 1: PK only appears in the join condition

  • Original SQL:
select o_custkey from orders left join customer on c_custkey=o_custkey
Enter fullscreen mode Exit fullscreen mode
  • Rewritten SQL:
select orders.o_custkey from orders
Enter fullscreen mode Exit fullscreen mode

Example 2: PK appears elsewhere

  • Original SQL:
select orders.* from customer right join orders on c_custkey=o_custkey and c_custkey>20 where o_orderstatus='T'
Enter fullscreen mode Exit fullscreen mode
  • Rewritten SQL:
select orders.* from orders where orders.o_orderstatus = 'T'
Enter fullscreen mode Exit fullscreen mode

Join Elimination in DBMS

Join Elimination is advanced optimization technique and provided by some advanced commercial database management systems, such as Oracle, DB2. But for MySQL and PostgreSQL, Join Elimination is not supported yet.

For the SQL in the first section:

select o.* from orders o inner join customer c on c.c_custkey=o.o_custkey
Enter fullscreen mode Exit fullscreen mode

Execution plan in MySQL is:

-> Inner hash join (o.O_CUSTKEY = c.C_CUSTKEY)  (cost=20541.07 rows=20013)
    -> Table scan on o  (cost=2529.20 rows=200128)
    -> Hash
        -> Index scan on c using key_idx  (cost=0.35 rows=1)
Enter fullscreen mode Exit fullscreen mode

Execution plan in PostgreSQL is:

Hash Join  (cost=401.29..711.56 rows=10001 width=115)
  Hash Cond: (o.o_custkey = c.c_custkey)
  ->  Seq Scan on orders o  (cost=0.00..284.01 rows=10001 width=115)
  ->  Hash  (cost=276.29..276.29 rows=10000 width=4)
        ->  Index Only Scan using customer_pkey on customer c  (cost=0.29..276.29 rows=10000 width=4)
Enter fullscreen mode Exit fullscreen mode

As can be seen, neither MySQL nor PostgreSQL support join elimination rewrite optimization.

Join Elimination in PawSQL

PawSQL provides a comprehensive join elimination optimization through the [JoinEliminationRewrite].

  • Input SQL statement:
 select orders.* from customer right join orders on c_custkey=o_custkey and c_custkey>20
Enter fullscreen mode Exit fullscreen mode
  • Rewritten SQL statement after Join Elimination
select orders.* from orders
Enter fullscreen mode Exit fullscreen mode
  • Execution plan before optimization(MySQL)
  -> Nested loop left join  (cost=90585.51 rows=200128)
    -> Table scan on orders  (cost=20540.71 rows=200128)
    -> Filter: (orders.O_CUSTKEY > 20)  (cost=0.25 rows=1)
        -> Single-row covering index lookup on customer using key_idx (C_CUSTKEY=orders.O_CUSTKEY)  (cost=0.25 rows=1)
Enter fullscreen mode Exit fullscreen mode
  • Execution plan after optimization:
-> Table scan on orders  (cost=20540.71 rows=200128)
Enter fullscreen mode Exit fullscreen mode

As shown, PawSQL provides excellent support for join elimination rewrite optimization, resulting in a performance improvement of 441.01% through the elimination of meaningless table joins.

Conclusion

Since MySQL and PostgreSQL do not support join elimination, join elimination optimization provided by PawSQL is a meaningful supplement for them. By eliminating meaningless table joins before SQL is deployed in a production environment, PawSQL can prevent the database from wasting resources on unnecessary table join operations.

Top comments (0)