loading...

Using C# LINQ API to perform SQL IN logic at WHERE clause

alexruzenhack profile image Alex Ruzenhack ・1 min read

I was working on a project with User and Costumer identificação. The first is generated by the web app, and the second by the business department.

So I had to develop a logic to make those match between identifications.

The User entity needs to know the Costumer identification and, in order to make the match, first I selected the users without a costumer and extracted a list of emails. Then, I selected the customers based on this email list. This statement can be transcribed to SQL syntax, like this:

SELECT *
FROM Customers
WHERE email IN (SELECT email FROM Users WHERE CustomerId IS NULL)

But how to translate the SQL to C# LINQ API?

  1. First, make the selection from Users, extracting the list of emails
  2. Select the Customers with those emails

This way:

var emails = _userManager.Users
    .Where(user => user.CustomerId == null)
    .Select(user => user.Email) // extract the emails from users
    .ToList();

var customers = _applicationRepository.GetCustomers()
    .Where(customer => emails.Contains(customer.Email)) // the Contains method carry the IN logic when translated to SQL script
    .ToList();

Now we have the match programmatically.

That's it!

Reference

Discussion

markdown guide
 

Thank Alex,
the translation looked easy to understand as the LINQ version matches with SQL statement 👍.


And I have a question regarding the conversion.

Would the LINQ version fetch all data from database first and filter (using Where clause) in memory?

or

would the query create an optimal syntax to send to the database and fetch only filtered records?

 

Calling ToList() results in execution of the query. So the first statement will make an in memory list of email addresses.

To get a better feel for what's happening you can intercept the actual queries. Assuming Entity framework is used: Log Queries executed by Entity Framework DbContext.

 

I stopped writing C# right as LINQ got popular so I don't know it too well but I believe this does store the list of emails in memory.

In SQL you could do this:

SELECT c.*
FROM customers AS c
JOIN users AS u ON u.email = c.email
WHERE u.customer_id IS NULL

and achieve the same result without a subquery. I don't know how you'd go about translating that to LINQ though.

 

In my opinion, this is a better approach, which doesn't require an additional call to the database and doesn't require to store entities in the memory (which could have terrible results in case of a huge collection).
And of course, it is available in LINQ.

 

It's true! But the tables are in different contexts, which means they are in different databases. So, it's not possible to make a JOIN between them.

 

Here is the EF SQL of "Users":

Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (110ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [user].[Id], [user].[Email], [user].[CustomerId], [user].[UserName]
FROM [AspNetUsers] AS [user]
WHERE [user].[CustomerId] IS NULL

And here is the EF SQL of "Customers":

Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (16ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [t].[Id], [t].[Email], [t].[FullName], [t].[RegisteredAt],  [t].[CreatedAt], [t].[UpdatedAt]
FROM [Customers] AS [t]

As we can see, the IN logic really occurs in memory.

 

Thanks for the clarification Alex 😎

 

I prefer to use stored procedures and table variables for this kind of query when it gets complex or is working against a large amount of data. I prefer to let the database do the job its been optimized for, parsing and sorting data, and use C# and LINQ for stuff SQL isn't optimized to do well, like string manipulation, or to work with smaller amounts of data, like lookup tables.

Of course, using techniques like in this article are good when there's a "turf war" between DBA's and application developers or when the application team has limited SQL skills and a DBA isn't available to write optimal procs.