We usually try to solve performance problems by using #includes
to get rid of N + 1 queries but this doesn't
always fix the issue; in fact, it can even create more queries under some circumstances. This post
shows a few examples of when that happens and how to deal with it.
Your typical N + 1 query problem
We learn how to deal with ActiveRecord performance issues with very plain examples, most blog posts show models that
look something like this:
class Post < ApplicationRecord
has_many :comments
end
class Comment < ApplicationRecord
belongs_to :post
end
Followed by your standard irb
example:
irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments }
Which generates the following queries, where we can clearly see the N + 1 query problem: Rails is doing one query
to retrieve the posts and N additional queries to retrieve the comments, where N is the number of posts.
Post Load (0.5ms) SELECT "posts".* FROM "posts"
Comment Load (0.8ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 [["post_id", 1]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 [["post_id", 2]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 [["post_id", 3]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 [["post_id", 4]]
Comment Load (0.6ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 [["post_id", 5]]
And that the solution is to preload the data by calling the #includes
method over the posts collection:
irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments }
Which in turn produces the following optimized queries:
Post Load (0.5ms) SELECT "posts".* FROM "posts"
Comment Load (1.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
And so we start using #includes
on all our ActiveRecord queries and move on.
A little ways down the road: changes to queries invalidate data preloading
Time passes and requirements change, it's only normal. Now we need to only show posts that are uncensored and we change
our query to reflect that:
irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }
Can you guess the amount of queries this is going to generate? I'll give you a hint: it's on the name of the post.
Post Load (0.5ms) SELECT "posts".* FROM "posts"
Comment Load (1.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Comment Load (6.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 1], ["censored", false]]
Comment Load (0.5ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 2], ["censored", false]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 3], ["censored", false]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 4], ["censored", false]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 5], ["censored", false]]
Now we have the N + 1 queries we had in the beginning and also an additional query to preload data we are not going to use
(which will increase our application's memory footprint but that's a story for another day). You may wonder why doesn't Rails
just iterate the comments collection and select posts that are not censored. Well, in this simple example, it could. But if we start doing
some more complex queries and adding raw SQL fragments it's going to get a lot more difficult.
Adding Bullet to log N + 1 queries
I agree that Bullet is a must have in any project, no matter how big or small. It's
very difficult to catch every N + 1 in every query we build and it's even harder to detect cases like this where we should no longer
preload the data.
After installing bullet what we get is the following warning:
AVOID eager loading detected
Post => [:comments]
Remove from your query: .includes([:comments])
Awesome! It knows about our unused preload and it tells us to remove it; let's do that.
irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }
And let's also check our logs for the generated queries and watch out for warnings from bullet.
Post Load (0.5ms) SELECT "posts".* FROM "posts"
Comment Load (6.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 1], ["censored", false]]
Comment Load (0.5ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 2], ["censored", false]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 3], ["censored", false]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 4], ["censored", false]]
Comment Load (0.2ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2 [["post_id", 5], ["censored", false]]
So we are back to square one but this time we have no bullet warnings, so what should we do?
What are the solutions to preloading custom queries?
There are two solutions (at least that I know of) to this performance problem. The first one is to use Rails' preloader
but as you may guess from the :nodoc:
directive that's a private class not meant to be used outside the framework. I'm not going
to even talk about how to preload using that class but if you are curious here's a nice post
on how to deal with N + 1 queries on GraphQL using Rails' preloader.
The second solution, and the one I'm going to explain here, is using a scoped association and preloading it instead of the
comments
association. This requires us to add one more line to our Posts model:
class Post < ApplicationRecord
has_many :comments
# we need to specify a new name, a lambda to filter the comments and the model class name
has_many :uncensored_comments, -> { where(censored: false) }, class_name: 'Comment'
end
And change our code to get the comments using the association:
irb> posts = Post.all
irb> all_comments = posts.map { |p| p.uncensored_comments }
Bingo! We get the following warning from bullet:
GET /posts
USE eager loading detected
Post => [:uncensored_comments]
Add to your query: .includes([:uncensored_comments])
And sure enough if we add that preload
irb> posts = Post.includes(:uncensored_comments)
irb> all_comments = posts.map { |p| p.uncensored_comments }
We no longer get the warning and our queries are optimized:
Post Load (0.5ms) SELECT "posts".* FROM "posts"
Comment Load (0.9ms) SELECT "comments".* FROM "comments" WHERE "comments"."censored" = $1 AND "comments"."post_id" IN ($2, $3, $4, $5, $6) [["censored", false], ["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Caveats: Evaluate performance optimizations
As with most performance optimizations you should really measure and evaluate the changes you are about to make. It doesn't
really make sense to add an association to your models every time you want to preload, sometimes it is better to have a small
performance penalty rather than a model full of associations.
But if your queries are taking too long I really encourage you to add the corresponding associations and preload the data you need.
Bonus: how to preload belongs_to associations
This method also works for cases when you need to just fetch one record. Let's use the blog example and add a use case where
we need the most liked comment from each Post:
irb> posts = Post.includes(:comments)
irb> most_liked_comments = posts.map { |p| p.comments.order(likes: :desc).first }
Once again we have an N + 1 and no warning from bullet
Post Load (0.4ms) SELECT "posts".* FROM "posts"
Comment Load (0.9ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2 [["post_id", 1], ["LIMIT", 1]]
Comment Load (0.7ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2 [["post_id", 2], ["LIMIT", 1]]
Comment Load (0.6ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2 [["post_id", 3], ["LIMIT", 1]]
Comment Load (0.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2 [["post_id", 4], ["LIMIT", 1]]
Comment Load (0.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2 [["post_id", 5], ["LIMIT", 1]]
And here is where has_one
comes to the rescue, let's modify our Posts model one last time
class Post < ApplicationRecord
has_many :comments
has_many :uncensored_comments, -> { where(censored: false) }, class_name: 'Comment'
# rails will automatically limit the number of records for us
has_one :most_liked_comment, -> { order(likes: :desc) }, class_name: 'Comment'
end
Bullet now complains about data not being preload us and gives us the solution to our problems
irb> posts = Post.includes(:most_liked_comment)
irb> most_liked_comments = posts.map { |p| p.most_liked_comment }
And we get neat SQL queries once more
Post Load (0.4ms) SELECT "posts".* FROM "posts"
Comment Load (0.4ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) ORDER BY "comments"."likes" DESC [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Notice though that in this case we don't get the LIMIT
clause on the SQL query and so Rails loads all these comments on memory and then
loads them on each post, that's also something to consider: memory usage vs SQL query time. As in most cases there's not a clear
answer and you should really measure to see your specific case.
If you like this post be sure to check out the original post and checkout other related goodies.
Top comments (6)
I think you can also do something like this (unless I have missed the point):
Hey, thanks for commenting! The
#includes
method does not accept filtering AFAIK, the where clause would be applied to the query to fetch the posts but not to the loaded comments. So when I paste that query on the console I get the following:Then if I add a method on the
Post
model to get uncensored comments:And run the same example I did in the post:
We go back the N + 2 query problem
I think the code you suggested would be a nice way of getting posts with comments, but it does not help preload custom queries.
Thanks, I didn't think this through. Or at all, with this weird
#includes
;)But still, you can do it without association in the model.
Yeah, you can always do the queries straight somewhere else, but that wouldn't help you preload the records. The example was very simple I hope it doesn't miss the point, which is preloading and not doing extra queries and not just being able to get the data. Maybe something like this paints a clearer picture:
This creates the same N + 2 queries as before since the
.includes(:comments)
it's actually useless here. Sure you can always find a way around not having associations in the model, one of the ways to do so and that I didn't mention in the post is to do the following:But that, IMO, it's very ugly and does not follow OOP at all.
Dude this post was awesome. Thanks!
Thank you for reading! I'm glad you liked it