Bruno Vezoli for Rootstrap

Posted on Jun 6, 2020 • Edited on Sep 25, 2020

Rails N + 2 queries

#rails #performance #optimization #activerecord

We usually try to solve performance problems by using #includes to get rid of N + 1 queries but this doesn't
always fix the issue; in fact, it can even create more queries under some circumstances. This post
shows a few examples of when that happens and how to deal with it.

Your typical N + 1 query problem

We learn how to deal with ActiveRecord performance issues with very plain examples, most blog posts show models that
look something like this:

class Post < ApplicationRecord
  has_many :comments
end

class Comment < ApplicationRecord
  belongs_to :post
end

Followed by your standard irb example:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments }

Which generates the following queries, where we can clearly see the N + 1 query problem: Rails is doing one query
to retrieve the posts and N additional queries to retrieve the comments, where N is the number of posts.

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (0.8ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 1]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 2]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 3]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 4]]
Comment Load (0.6ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1  [["post_id", 5]]

And that the solution is to preload the data by calling the #includes method over the posts collection:

irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments }

Which in turn produces the following optimized queries:

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (1.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5)  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]

And so we start using #includes on all our ActiveRecord queries and move on.

A little ways down the road: changes to queries invalidate data preloading

Time passes and requirements change, it's only normal. Now we need to only show posts that are uncensored and we change
our query to reflect that:

irb> posts = Post.includes(:comments)
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }

Can you guess the amount of queries this is going to generate? I'll give you a hint: it's on the name of the post.

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (1.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5)  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
Comment Load (6.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 1], ["censored", false]]
Comment Load (0.5ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 2], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 3], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 4], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 5], ["censored", false]]

Now we have the N + 1 queries we had in the beginning and also an additional query to preload data we are not going to use
(which will increase our application's memory footprint but that's a story for another day). You may wonder why doesn't Rails
just iterate the comments collection and select posts that are not censored. Well, in this simple example, it could. But if we start doing
some more complex queries and adding raw SQL fragments it's going to get a lot more difficult.

Adding Bullet to log N + 1 queries

I agree that Bullet is a must have in any project, no matter how big or small. It's
very difficult to catch every N + 1 in every query we build and it's even harder to detect cases like this where we should no longer
preload the data.

After installing bullet what we get is the following warning:

AVOID eager loading detected
  Post => [:comments]
  Remove from your query: .includes([:comments])

Awesome! It knows about our unused preload and it tells us to remove it; let's do that.

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.comments.where(censored: false) }

And let's also check our logs for the generated queries and watch out for warnings from bullet.

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (6.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 1], ["censored", false]]
Comment Load (0.5ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 2], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 3], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 4], ["censored", false]]
Comment Load (0.2ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 5], ["censored", false]]

So we are back to square one but this time we have no bullet warnings, so what should we do?

What are the solutions to preloading custom queries?

There are two solutions (at least that I know of) to this performance problem. The first one is to use Rails' preloader
but as you may guess from the :nodoc: directive that's a private class not meant to be used outside the framework. I'm not going
to even talk about how to preload using that class but if you are curious here's a nice post
on how to deal with N + 1 queries on GraphQL using Rails' preloader.

The second solution, and the one I'm going to explain here, is using a scoped association and preloading it instead of the
comments association. This requires us to add one more line to our Posts model:

class Post < ApplicationRecord
  has_many :comments
  # we need to specify a new name, a lambda to filter the comments and the model class name
  has_many :uncensored_comments, -> { where(censored: false) }, class_name: 'Comment'
end

And change our code to get the comments using the association:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.uncensored_comments }

Bingo! We get the following warning from bullet:

GET /posts
USE eager loading detected
  Post => [:uncensored_comments]
  Add to your query: .includes([:uncensored_comments])

And sure enough if we add that preload

irb> posts = Post.includes(:uncensored_comments)
irb> all_comments = posts.map { |p| p.uncensored_comments }

We no longer get the warning and our queries are optimized:

Post Load (0.5ms)  SELECT  "posts".* FROM "posts"
Comment Load (0.9ms)  SELECT "comments".* FROM "comments" WHERE "comments"."censored" = $1 AND "comments"."post_id" IN ($2, $3, $4, $5, $6)  [["censored", false], ["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]

Caveats: Evaluate performance optimizations

As with most performance optimizations you should really measure and evaluate the changes you are about to make. It doesn't
really make sense to add an association to your models every time you want to preload, sometimes it is better to have a small
performance penalty rather than a model full of associations.

But if your queries are taking too long I really encourage you to add the corresponding associations and preload the data you need.

Bonus: how to preload belongs_to associations

This method also works for cases when you need to just fetch one record. Let's use the blog example and add a use case where
we need the most liked comment from each Post:

irb> posts = Post.includes(:comments)
irb> most_liked_comments = posts.map { |p| p.comments.order(likes: :desc).first }

Once again we have an N + 1 and no warning from bullet

Post Load (0.4ms)  SELECT "posts".* FROM "posts"
Comment Load (0.9ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 1], ["LIMIT", 1]]
Comment Load (0.7ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 2], ["LIMIT", 1]]
Comment Load (0.6ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 3], ["LIMIT", 1]]
Comment Load (0.3ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 4], ["LIMIT", 1]]
Comment Load (0.4ms)  SELECT  "comments".* FROM "comments" WHERE "comments"."post_id" = $1 ORDER BY "comments"."likes" DESC LIMIT $2  [["post_id", 5], ["LIMIT", 1]]

And here is where has_one comes to the rescue, let's modify our Posts model one last time

class Post < ApplicationRecord
  has_many :comments
  has_many :uncensored_comments, -> { where(censored: false) }, class_name: 'Comment'
  # rails will automatically limit the number of records for us
  has_one :most_liked_comment, -> { order(likes: :desc) }, class_name: 'Comment'
end

Bullet now complains about data not being preload us and gives us the solution to our problems

irb> posts = Post.includes(:most_liked_comment)
irb> most_liked_comments = posts.map { |p| p.most_liked_comment }

And we get neat SQL queries once more

Post Load (0.4ms)  SELECT "posts".* FROM "posts"
Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4, $5) ORDER BY "comments"."likes" DESC  [["post_id", 1], ["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]

Notice though that in this case we don't get the LIMIT clause on the SQL query and so Rails loads all these comments on memory and then
loads them on each post, that's also something to consider: memory usage vs SQL query time. As in most cases there's not a clear
answer and you should really measure to see your specific case.

If you like this post be sure to check out the original post and checkout other related goodies.

Top comments (6)

Tomek Urban • Jun 9 '20

sometimes it is better to have a small performance penalty rather than a model full of associations

I think you can also do something like this (unless I have missed the point):

Pots.includes(:comments).where(comments: Comment.where(censored: false))

Bruno Vezoli • Jun 9 '20

Hey, thanks for commenting! The #includes method does not accept filtering AFAIK, the where clause would be applied to the query to fetch the posts but not to the loaded comments. So when I paste that query on the console I get the following:

irb> Post.includes(:comments).where(comments: Comment.where(censored: false))
  Post Load (2.1ms)  SELECT  "posts".* FROM "posts" WHERE "posts"."id" IN (SELECT "comments"."post_id" FROM "comments" WHERE "comments"."censored" = $1) LIMIT $2  [["censored", false], ["LIMIT", 11]]
  Comment Load (0.4ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4)  [["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]

Then if I add a method on the Post model to get uncensored comments:

  def uncensored_comments
    comments.where(censored: false)
  end

And run the same example I did in the post:

irb> posts = Post.all
irb> all_comments = posts.map { |p| p.uncensored_comments }

We go back the N + 2 query problem

  Post Load (2.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."id" IN (SELECT "comments"."post_id" FROM "comments" WHERE "comments"."censored" = $1)  [["censored", false]]
  Comment Load (0.6ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN ($1, $2, $3, $4)  [["post_id", 2], ["post_id", 3], ["post_id", 4], ["post_id", 5]]
  Comment Load (0.3ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 2], ["censored", false]]
  Comment Load (0.2ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 3], ["censored", false]]
  Comment Load (0.8ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 4], ["censored", false]]
  Comment Load (1.2ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = $1 AND "comments"."censored" = $2  [["post_id", 5], ["censored", false]]

I think the code you suggested would be a nice way of getting posts with comments, but it does not help preload custom queries.

Tomek Urban • Jun 10 '20

Thanks, I didn't think this through. Or at all, with this weird #includes ;)
But still, you can do it without association in the model.

Comment.where(post: Post.all, censored: false)

Bruno Vezoli • Jun 10 '20

Yeah, you can always do the queries straight somewhere else, but that wouldn't help you preload the records. The example was very simple I hope it doesn't miss the point, which is preloading and not doing extra queries and not just being able to get the data. Maybe something like this paints a clearer picture:

posts = Post.all.includes(:comments)
posts.map do |post|
  create_thumbnail(post, post.uncensored_comments)
end

This creates the same N + 2 queries as before since the .includes(:comments) it's actually useless here. Sure you can always find a way around not having associations in the model, one of the ways to do so and that I didn't mention in the post is to do the following:

posts = Post.all
uncensored_comments = Comment.where(post: posts, uncensored: false).group_by(&:post_id)
posts.map do |post|
  create_thumbnail(post, uncensored_comments[post.id])
end