loading...
Cover image for Preventing Useless Database Hits

Preventing Useless Database Hits

molly_struve profile image Molly Struve (she/her) Updated on ・5 min read

After reading the title of this post many of you are probably thinking...

DUH, I already know how to do that. But let’s just hold up a minute, because it might not be as obvious as you think. For example, how many of you have written code like this? I know I have!

User.where(:id => user_ids).each do |user|
  # Lots of user processing 
end
Enter fullscreen mode Exit fullscreen mode

This code looks pretty good, right? If there are no user_ids this block will skip all of the user processing. That seems like a pretty good deal, so it must be fine. Unfortunately, that assumption is not quite right. Let me explain why.

When you execute that where clause even with an empty array it is actually going to hit MySQL.

(pry)> User.where(:id => [])
User Load (1.0ms)  SELECT `users`.* FROM `users` WHERE 1=0
=> []
Enter fullscreen mode Exit fullscreen mode

Notice the where 1=0 statement at the end of the SQL. That is how ActiveRecord ensures no records are returned. Sure, it's a quick 1ms query, but if you are executing this chunk of code millions and millions of times that quick query can easily overwhelm your database and slow you down. So how do you update this code to make it more performant?

You have two options. The first is by not running the MySQL lookup unless you absolutely have to. You can do this by doing an easy peasy array check using Ruby before you execute the block.

return unless user_ids.any?
User.where(:id => user_ids).each do |user|
  # Lots of user processing 
end
Enter fullscreen mode Exit fullscreen mode

By doing this you can save yourself a worthless database hit and ensure that your database is not being overwhelmed by useless calls. In addition, this will also speed up your code. Say you were running this code 10k times. It is going to take you over half a second to run that MySQL lookup 10k times.

(pry)> Benchmark.realtime do                                                           
>   10_000.times { User.where(:id => []) }                                                    
> end                                                                                          
=> 0.5508159045130014
Enter fullscreen mode Exit fullscreen mode

If instead, you skip that MySQL query by checking to see if any user_ids are present first, then running a similar code block 10k times takes less than a hundredth of a second!

(pry)> Benchmark.realtime do                                                           
>   10_000.times do                                                                            
>       next unless ids.any?                                                                     
>       User.where(:id => [])                                                                   
>   end                                                                                        
> end                                                                                          
=> 0.0006368421018123627
Enter fullscreen mode Exit fullscreen mode

As you can see, there's a significant time difference between hitting MySQL unnecessarily 10k times and executing plain old Ruby 10k times. This difference can have a significant impact on the performance of your application. A lot of people will look at this chunk of code

User.where(:id => user_ids).each do |user|
  # Lots of user processing 
end
Enter fullscreen mode Exit fullscreen mode

and the first thing they say is well “Ruby is slow.” But that could not be further from the truth since we just saw that the pure Ruby code is hundreds of times faster! In this case, Ruby is not slow, hitting the database is slow! Keep an eye out for situations like this in your code where you might be making database calls you don't expect.

Now some folks might be looking at this code and thinking I’m not exactly writing code like that. Actually, I chained a bunch of scopes to my where clause

users = User.where(:id => user_ids).active.short.single
Enter fullscreen mode Exit fullscreen mode

so I NEED to pass that array of empty user_ids otherwise the scope chain breaks. Thankfully, even though ActiveRecord doesn't handle empty arrays well, it does give you an option for handling empty scopes and that is the none scope.

none is an ActiveRecord query method that allows you to return a chainable relation with zero records without querying the database. Let’s see this in action. From earlier we know that a where clause with an empty set of IDs is going to hit the database.

(pry)> User.where(:id => []).active.tall.single
  User Load (0.7ms)  SELECT `users`.* FROM `users` WHERE 1=0 AND `users`.`active` = 1 AND `users`.`short` = 0 AND `users`.`single` = 1
=> []
Enter fullscreen mode Exit fullscreen mode

But, if we replace that where clause with the none scope you can see no database call is made and we can still chain our scopes together.

(pry)> User.none.active.tall.single
=> []
Enter fullscreen mode Exit fullscreen mode

Be on the lookout for tools like this in your frameworks that will allow you to work smarter with empty datasets. Even more importantly, NEVER ever assume your framework or gem is not making a database call when asked to process an empty dataset. For more tips on finding calls like these, checkout my blog post on logging.

Real Life Application

This concept of using Ruby to prevent database hits is not just limited to MySQL; it can apply to any type of database! Where we found it useful at Kenna was when it came to building what we call reports. Every night we build these colorful PDF reports for our clients.

These reports start with a reporting object, which holds all the information needed to build that report. Then, to build that pretty report page every night we have to make over 20 Elasticsearch requests along with multiple requests to Redis and MySQL.

We did a lot of work to make sure all these requests were super fast, but it was still taking us hours to build the reports. Eventually, the number of saved reports grew so much that we couldn’t finish them all over night. When my team and I started trying to figure out how to solve this problem, the first thing we did was jump in a console to take a close look at what data our existing reports contained. After some digging, we found that out of the 25k reports in our system over a third of them were blank!!!

(pry)> Report.blank_reports.count
=> 10805
Enter fullscreen mode Exit fullscreen mode

This means they contained no data, and if the reports contained no data, then what was the point of making all of those Elasticsearch, MySQL, and Redis requests when we knew they were going to return nothing?

LIGHTBULB! Don’t hit the databases if the report is empty! By skipping the reports that had no data we took our processing time from over 10 hrs down to 3. By adding a simple line of Ruby

def build(report)
  return if report.blank?
  # Processing
end
Enter fullscreen mode Exit fullscreen mode

we were able to prevent a bunch of worthless database hits, which in turn, sped up our processing tremendously. This strategy of using Ruby to shield your databases from requests, I like to refer to it as using database guards. In practice it’s simple, but I think its one of the easiest things to overlook when you are writing code. Every database hit is using resources so make them all count!

If you are interested in other ways to prevent database hits using Ruby checkout my Cache Is King speech from RubyConf which is what inspired this post.

Discussion

pic
Editor guide
Collapse
joshcheek profile image
Josh Cheek

Nice post 😊 I didn't know about none. I wonder why AR queries the db with 1=0 instead of doing whatever .none does 🤔

Collapse
molly_struve profile image
Molly Struve (she/her) Author

Right? I went into the AR source code one weekend and tried to update it in such a way that none would be used when it saw an empty array. Broke a lot of specs and after 3 hours of trying to fix them I gave up. One day I will go back!

Collapse
joshcheek profile image
Josh Cheek

I feel that, lol. So vindicating when you finally get it after numerous independent attempts!

Collapse
ben profile image
Ben Halpern

Brilliant Rails post.

Collapse
johnbwoodruff profile image
John Woodruff

Would love to see a post from a dev.to team member about performance optimizations you guys have done with this site to make it so fast. I'd imagine there are lots of neat tricks to share!

Collapse
moopet profile image
Ben Sinclair

There're a couple of things I don't understand about this.

Why would ActiveRecord want to ensure that no records are returned with WHERE 1=0? Unless... do you want a "chainable relation" of foos to which you can later add items? That seems like a topsy-turvy way of working. What am I getting mixed up on?

If the service is smart enough to inject a condition if the query is expected to be empty, why isn't it smart enough to not bother running it in the first place?

Why isn't the query cached, since it's identical to the previous one and most databases will have a cheap memory cache for that sort of thing?

In short, I don't understand why Ruby would be asking the database for something it already knows is going to be empty.

Collapse
molly_struve profile image
Molly Struve (she/her) Author

Why would ActiveRecord want to ensure that no records are returned with WHERE 1=0?

Your assumption is correct, they do this so they can return an active record relation. I dove into the ActiveRecord codebase trying to find a way to update it so that it would not hit the database and would return a None Relation instead but I was never able to figure it out and get all the specs passing.

If the service is smart enough to inject a condition if the query is expected to be empty, why isn't it smart enough to not bother running it in the first place?

What we are working with when dealing with these empty arrays is a method that we expect to return an active record relation so that we can chain other things to it. This is why we want to use the none scope, then we don't have to place conditionals all over the code base when using the method since it handles that internally. Here is one of the methods we use it in

def connectors
  @connectors ||= begin
    connector_ids = vulns.select(:connector_id)
    if connector_ids.empty?
     Connector.none
    else
     Connector.with_connector_definition(connector_ids)
    end
  end
end

Then all over the codebase we can call connectors.something and never have to worry about it.

Why isn't the query cached, since it's identical to the previous one and most databases will have a cheap memory cache for that sort of thing?

I tried to simplify the concept for the example but in real life these queries are being executed across many workers so the only layer of cacheing available is at the database level. Even then, we dont want to have to make a round trip call to the database if we dont have too

I don't understand why Ruby would be asking the database for something it already knows is going to be empty.

It is not actually Ruby doing the asking, it is Rails. When we take Ruby frameworks and use them they can hide a lot of what is going on under the hood and if you are not careful you can end up with lots of these types of useless database hits. The frameworks are trying to be as user friendly as possible and in some cases that can cause a loss in performance optimizations

Hopefully that explanation makes sense. Let me know if you have any other questions!

Collapse
cseeman profile image
christine

Love all the code examples and how this is pretty much a broken down white paper but more approachable. This makes it so much easier to learn from your experience! I highly recommend everyone check out Molly's RubyConf talk, she rocked it!

Collapse
pyemkey profile image
Michał Kłoda

@molly I've watched your speech 2 days ago. Great stuff!

Collapse
pyemkey profile image
Michał Kłoda

@molly do you have any resources about caching strategies? What approach did you use for cache invalidation?

Collapse
molly_struve profile image
Molly Struve (she/her) Author

Shoot, I some how totally missed this comment last month! Sorry for the late reply!

I don't really have any good resources to share since most of what we learned was through trial and error on the job. This is a big reason why I wanted to share our experiences so hopefully others can learn from them. Redis docs are also super awesome!

As for cache invalidation, that is always tricky. You really have to do what works best for your situation and use case. You want it short enough that basically no one really notices its cached, yet long enough to provide performance benefit. For somethings a 5 minute cache gives us huge gains while other caches have to be set for 24 hours to see any benefit. Take your best guess at what you think is a good number and then track your performance. Is it helping? How much is it helping? Is it noticeable at all to clients or customers? You will likely have to tweak your original guess but eventually you will find the sweet spot. We have had to tweak more than one invalidation timeframe getting to where we are now and I suspect we will be tweaking many more in the future.

Collapse
leandrogs profile image
Leandro Gomes

Cool! Sometimes we think that making a Rails app perform better is hard, but you proved the opposite!

Collapse
andy profile image
Andy Zhao (he/him)

Awesome post! Learned a lot :)

Collapse
johand profile image
Johan

Nice post and good speech 👏 👏

Collapse
darkain profile image
Vincent Milum Jr

Last time I used WordPress a few years ago, it was making over 80 database calls JUST to render the generic default homepage.. That is just absolutely bonkers!

Collapse
rafaeljesus profile image
Rafael Jesus

Nice post!

I would be even happier to have seen the performance improvements guided by monitoring/tracing data ^