Avoid SQL Server scalar-valued functions in WHERE statements at all cost!

#sql #5days5blogposts

This one requires a little intro.

First, you have to understand what a scalar value does: It returns a single value. It can be an INT, a BIT or even a whole NVARCHAR with hundreds of characters, but it will return only one full value, and never a set of them. If you need a set of values, use table functions.

Secondly, there is a certain order to SQL queries that you have to know about. More info about this order and exceptions from it can be found here, but the short lesson is that if you have a SELECT FROM WHERE query, the order of operations will be:

FROM
WHERE
SELECT

That means that SQL Server will first grab all possible records from chosen tables, then filter them using your conditions, and then show you the result.

Third and final intro information Scalar-valued functions in WHERE statements work ONCE for every row.

They work ONCE per ROW. Let that sink in, understand what it means.

If your FROM statement grabs, say, 200 records, it will run a function 200 times. If it'll grab 500,000 records, that's how many times a function will run, for every single row of data you grabbed. And remember: Not just those you SELECTed, but every single cell possible.

You might say "but my function only runs for a millisecond!", and you might even be right. But 500,000 times one millisecond is 500 seconds, so give or take 8 minutes.

Let that sink in.

My advice? Avoid them at all cost. But what if you have to use a scalar-valued function in WHERE statement? What if you think you can't live without it? Say, for example, that your user can only see some records on your table, while others shouldn't show up on the interface, and your function is something along the lines of Has_Permition(User_Id, Item_Id).

--Assume that @User_Id is declared eariler, as a procedure argument or variable
SELECT 
    Item_Id,
    Item_Name
    Item_Description
FROM 
    Items
WHERE
    Has_Permission(@User_Id, Item_Id) = 1

What then? I've identified at least two possible solutions.

One: Inline your function content into query

This isn't really intuitive, but if you do it, you make sure that your records are processed in bulk.

SELECT 
    Item_Id,
    Item_Name
    Item_Description
FROM 
    Items AS I
WHERE
    EXISTS 
    (
            SELECT TOP 1 1
            FROM 
                ItemCategoryPermissions AS ICP
                JOIN Permissions AS P ON P.Id = ICP.Permission_Id
                JOIN UserPermissions AS UP ON UP.Permission_Id = P.Id
            WHERE
                ICP.Category_Id = I.Category_Id
                AND UP.User_Id = @User_Id
    )

Let's stop here for a second and analyze this query.

Instead of checking every single Item_Id in Has_Permission, which, depending on the number of items, could take forever, and check every item at once in this kind of bulk query.

Now, I get why one would want to avoid that, because it defies the basic idea of using functions in programming - to avoid code repetitions. However, in SQL sometimes it's better to write code like this in the name of performance. Of course, it depends on the specific case: If it's part of a job running once a day, you might care less for performance, and more for nice, maintainable code. However, if you're running a stored procedure straight from user-facing interface, to show someone a table filled with data, performance is a top priority - and this is how you can achieve it.

Two: Minimize number of records processed through function.

You can achieve this in many ways, but there are two main ones I lean to.

First, and the one I prefer, is to use a Common Table Expression, or CTE for short, and get only records that are relevant for this specific search. For example, if your search window only look for a specific category of item, you could restrict the number of items to it:

;WITH ExampleCte AS
(
    SELECT
        Id,
        Name,
        Description
FROM
    Items
WHERE
    Category = @Category_Id
)
SELECT
    ID, 
    Name, 
    Description
FROM
    ExampleCte
WHERE
    Has_Permission(@User_Id, Id)

This way records going into Has_Permission function will be limited to a specific category of items, so it won't be called as many times.

Another version of this query would be using tempdb table, but that's just an implementation and preference choice.

From my experience, if you need performance, the first option is always better, as it processes records in bulk, instead of doing them one after another. You should still analyze what is the best option in your specific case. That being said, be wary of scalar-valued functions, as they can make a big mess if you strive for the most optimal code!

SQL Server 2019 User-Defined Functions Inlining

For the sake of being thorough, I have to touch this subject.

What is UDF Inlining? Microsoft added it to the SQL Server 2019 to get rid of performance problems caused by scalar functions in queries. It's supposed to do our solution one: Inline the body of the function to main query, and make the whole thing perform better. However, there is a big problem.

It rarely works.

I've seen in it action, and it broke most of stored procedures in a freshly migrated project. It could destroy something in your project or not, but be warned: It can make a big mess.

Summary

Scalar-Valued Functions in WHERE statements are a big performance hit
You can easily inline them by hand or minimize the problem by restricting the amount of data processed by those functions
You can try to use UDF Inlining in SQL Server 2019, but you risk breaking every single stored procedure in your database.

Did you have the same experience with scalar functions in SQL Server? What are your ways of optimizing it? Please, share in the comments!