I now know that I suck at writing performant queries. I started this week off by my supervisor commenting that the code I pushed took 5s to load, cause it called upwards of 1000 queries. 1000? Impossible. Not my code. Did it really??(spoiler: it did) Let me explain.
Active Record
Rails' active record is a gem to behold.(Pun intended) but with great power... Consider these two code snippets
users = User.all
users.count
users = User.all
users.size
Which do you think will run faster? They're both preloaded, so the second line in either should run in constant O(1) time regardless, right?? WRONG
Count will always trigger a query to the database because it counts number of elements using SQL query (SELECT COUNT(*) FROM...)
Size on the other hand will trigger the query to the DB only if it is called on something that has not been loaded. If it has, then size will operate in memory.
Honorable mention here is length that will only return a value if it is called on a preloaded element. Find out more here. This is a small blip but these queries add up if you use them in say, a loop.
N + 1
This is a common error many people make. People like me especially. It occurs when you intend to make a query to fetch data, but because of the nature of the associations within the database, the request triggers an additional N queries.
Think of it like this. You have never baked a day in your life, but you decide that today, you will bake a cake. So you get a recipe online it tells you that you need the regular ingredients, so you go to the store and buy them and come back home. Once you're home you realise, you need an oven to bake a cake, so you go back to the store, buy an oven and then come back. Then you realise you need baking trays, so you get back in your uber and off you go to buy baking tins. 'This would be so much easier if I had just included everything I needed on the first trip' you say to yourself. This is the N+1 problem. You just wanted to bake a cake, but you ended up spending a lot more time fetching stuff than you did baking.
So how do you get everything on the first trip to the store? ActiveRecord to the rescue again.
Include, Preload and EagerLoad
I will be referencing the Active Record Query Interface here.
consider this example
books = Book.limit(10)
books.each do |book|
puts book.author.last_name
end
for each book within the loop, there is a query to get the associated author and grab their last name. For 10 books, that's 11 queries. (10 + 1 ... N + 1). The problem here is really size complexity of this query, O(N). Why is this bad? well, memory is prime real estate in production. If your server is busy grabbing a couple million records, chances are it is not serving anyone else.
Includes, Preload and eager load are solutions to this problem.
Includes => Active Record ensures that all of the specified associations are loaded using the minimum possible number of queries.
books = Book.includes(:author).limit(10)
books.each do |book|
puts book.author.last_name
end
This will now only run 2 queries as opposed to 11 because the authors have also been loaded in memory.
Preload => Active Record loads each specified association using one query per association.
Eager Load => Active Record loads all specified associations using a LEFT OUTER JOIN.
Why not just load it all in memory then?
What I notice is that loading into memory is often more performant than pinging the database. But then again, there's the issue of how much do you load into memory? Consider this
# we have a million users and limited memory
User.all.each do |user|
This would cause us to run out of memory pretty fast cause it will try to load all users in memory. Especially if we are loading it included with any associations
User.find_each() do |user|
This is easier on memory as it will load the users in batches of 1000, reducing the load on memory, thus giving the server some breathing space.
Conclusion
There's still a lot for me to work on with regards to my DB skills. Hopefully by adding these to my skill-set will help me write more performant code. Got any tips on how to do better Queries? Share with the class in the comments below.
Top comments (2)
I am also on a quest to improve my database queries knowledge. Getting to know the SQL query being executed by some of the ORM "magic" methods helps me decide if I should exit from the ORM for a more performant query.
While developing a feature, I use a debugging tool that helps me monitor the queries executed. This has helped me catch the N+1 query problem a couple of times actually.
I've also been trying to better understand the foundations of SQL like indexes and how they help with optimization. For this I have been following the SQLForDevs newsletter which has been helpful with nuggets of knowledge so far.
This is amazing. Will definitely checkout the newsletter and get myself better setup.