loading...

Improving SQL Query by Adding conditions in Joins

akashkava profile image Akash Kava ・1 min read

Recently I was trying to optimize query, this was the query,

SELECT ... 
FROM A
INNER JOIN B
   ON B.ID = A.ID
WHERE B.Status = 'Done' 
   AND B.DateCreated BETWEEN @Start and @End
   AND NOT EXISTS (...)
   AND A. ... other conditions

Table B is huge table, and query took couple of seconds. Even though there was an index of (ID, Status, DateCreated).

When I changed query to,

SELECT ... 
FROM A
INNER JOIN B
   ON B.ID = A.ID AND B.Status = 'Done'
   AND B.DateCreated BETWEEN @Start and @End
   AND NOT EXISTS (...)
WHERE A. ... other conditions

Surprisingly query only took few milliseconds. Upon further investigation I found that moving NOT EXISTS inside JOIN improved the speed.

Posted on by:

Discussion

markdown guide
 

You were just lucky. Query optimizer has chosen a non cached version of plan to execute this query.

 

I don't think that was the case, Execution plans for both queries are different, Also these queries were executing frequently and I saw the difference even executing them in parallel (half queries in old way and half in new way simultaneously).

 

Plans have obviously have to be different.

I see this is SQL Server and it will cache plan for an exact query you run. By moving date time clause to join condition you force a new plan generation.

You should be able to get same results by forcing plan recompilation

I looked at my query again and I found out that I had an extra clause NOT EXISTS which was causing difference in speed and plans, omitting not EXISTS didn't make difference in speed in both cases.

 

That's not luck, it would absolutely produce a new plan because the text changed. Even adding a space or making a letter upper case will generate a new plan.

 

Yes, this is correct.

I should've been more clear that moving clause from where to join does not add any performance on its own, new forcefully generated plan did. Thanks for bringing this up!

 

In any decent optimizer, INNER JOIN and WHERE conditions are basically interchangeable.

Also, I'm suspicious about any index that starts on ID because they are highly selective from the start.

Would like to see the execution plans of the two different queries.

 

You also have to look at the possible 'reason for early termination' in the plan. If the plan is not able to be fully optimized and has a reason for early termination, it may have executed serially and in these cases you may get a better plan with the filter in the join. However, your problem lies in the fact that the optimizer isn't able to complete. Potential causes there are complex queries or auto updating statistics, to name a couple.

 

Can you share the execution plans?

 

That's just a very useless observation and doesn't prove anything.
Have you figured out why exactly did this happen? What's in NOT EXISTS condition? What are the execution plans?