When studying backend development, I learned that using SQL (and other database querying languages) directly will result in much faster code. SQL is built specifically for querying, so it certainly makes sense, but I've never actually verified if this was in fact true and, if it is, how much faster it is. So, I decided to put it to the test and I created a mini-project in Ruby on Rails to validate the hypothesis that my code will run faster, all else being equal, if I use a direct SQL injection rather than a Ruby-ActiveRecord based approach. Here's what happened.
The Setup
First thing, I needed to create my test scenario and then implement a SQL and an ActiveRecord implementation that gets me to the same result. I'm not going to go through the steps to create the project, but here are the highlights:
- Users and associated Orders are stored in the database
- Orders have a payer_name and created_at attribute
- In the Order Model, I have a method to extract the orders associated to a given user, and sort them by created_at
- Optionally, I can filter the results by specifying a payer_name when calling the method
- My demo user will have 10 associated orders
- 6 of these orders have a payer_name equal to "Store1"
After setting up my app and database, I need to code the method described in the above bullets.
Regular Implementation (Ruby & ActiveRecord)
Here's what this code looks like without directly injecting SQL.
def self.sort_orders(user_id, payer_name = '*')
if payer_name == '*'
User.find(user_id).orders.sort {|a, b| a.created_at <=> b.created_at}
else
User.find(user_id).orders.filter{|t| t.payer_name == payer_name}.sort {|a, b| a.created_at <=> b.created_at}
end
end
So, regardless of whether a payer_name is specified, the code starts by finding the User instance based on the passed-in user_id, then it finds all the orders that are associated. IF a payer_name is specified, the results are filtered at this point to only include results where payer_name matches what was provided. Finally, any remaining results are sorted in ascending order by created_at.
Note that, if a payer_name is specified, I filter the results BEFORE sorting. This will help my performance a little bit and make the test more fair.
SQL Injection Implementation
Now, I want to add a method that does exactly the same thing, but this time I'll retrieve the results with SQL instead of relying on Ruby and ActiveRecord.
def self.sort_orders_SQL(user_id, payer_name = '*')
if payer_name == '*'
self.find_by_sql(["SELECT * FROM orders WHERE user_id = ? ORDER BY created_at ASC", user_id].flatten)
else
self.find_by_sql(["SELECT * FROM orders WHERE user_id = ? AND payer_name = ? ORDER BY created_at ASC", user_id, payer_name].flatten)
end
end
As stated above, this does the exact same thing. One major advantage; however, is that I can add the order logic in the SQL query, avoiding an iterator function altogether. Same deal for the filter logic if a payer_name is specified. This is an excellent use case for injecting a database query!
The Test
Now that I'm all set up, let's get to testing. For this, I'll use a ruby gem called benchmark to calculate how much time it takes to run each method. Since these are not huge methods, I'll run them each many times so that my results can be more easily operated on (I don't want to type that many ".000000"s if I don't have to). I'll use the bm
method to run both tests at the same time. I'll start by running each method 1000 times, then just for fun, I'll run the same test method with 5,000 and 10,000 repetitions.
Testing Without 'payer_name' Filter
Benchmark.bm do |benchmark|
benchmark.report("no-inject") do // label denoting no SQL injection
1000.times do
Order.sort_orders(User.first.id)
end
end
benchmark.report("injectSQL") do // label denoting SQL injection used
1000.times do
Order.sort_orders_SQL(User.first.id)
end
end
end
Running the test 1,000 times each results in the following data (Note: Reduction column is calculated, not provided by benchmark gem):
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 0.421875 | 1.609375 | 2.03125 |
injectSQL | 0.203125 | 0.5 | 0.703125 |
Reduction | 51.85% | 68.93% | 65.38% |
5,000 repetitions resulted in the following data:
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 0.828125 | 3.21875 | 4.046875 |
injectSQL | 0.6875 | 2.078125 | 2.765625 |
Reduction | 16.98% | 35.44% | 31.66% |
Finally 10,000 repetitions resulted in the following data:
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 1.890625 | 6.234375 | 8.125000 |
injectSQL | 1.156250 | 2.625000 | 3.781250 |
Reduction | 38.84% | 57.89% | 53.46% |
Testing With 'payer_name' Filter
Now, adding the criteria that I only want to return orders where payer_name is Store1:
Benchmark.bm do |benchmark|
benchmark.report("no-inject") do // label denoting no SQL injection
1000.times do
Order.sort_orders(User.first.id, "Store1")
end
end
benchmark.report("injectSQL") do // label denoting SQL injection used
1000.times do
Order.sort_orders_SQL(User.first.id, "Store1")
end
end
end
Running the test 1,000 times each results in the following data:
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 0.390625 | 0.9375 | 1.328125 |
injectSQL | 0.09375 | 0.203125 | 0.296875 |
Reduction | 76.00% | 78.33% | 77.65% |
5,000 repetitions resulted in the following data:
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 1.234375 | 3.59375 | 4.828125 |
injectSQL | 1.0625 | 1.3125 | 2.375 |
Reduction | 13.92% | 63.48% | 50.81% |
Finally 10,000 repetitions resulted in the following data:
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 3.28125 | 11.359375 | 14.640625 |
injectSQL | 1.96875 | 4.203125 | 6.171875 |
Reduction | 40.00% | 63.00% | 57.84% |
Results
With this data, the hypothesis that direct SQL injection will work faster than Ruby & ActiveRecord is, in fact, supported. Additionally, when looking at the reduction figures, they suggest that it's actually exceedingly faster! It's important to keep in mind that there are other factors effecting the results, including what programs are running on my computer, so running the same test twice is highly unlikely to get the same results. However; with an average reduction of 50.17% for non-filtered, and a whopping 62.10% for filtered results, there is no question that this SQL injection strategy is a huge performance boost in the right circumstance.
Bonus
There was something that was bothering me about my analysis above. The SQL strategy was a huge performance boost, BUT it has the benefit of skipping iterator methods, which will certainly give it the edge. To measure the effect of purely SQL vs ActiveRecord, I need to cut out that mediating factor. So here is the table of results for the same test, but with all sorting and filtering removed. This is simply finding the user based on the user_id, and getting all associated orders in no particular order.
I ran the test with 5,000 iterations.
System CPU Time | User CPU Time | Total CPU Time | |
---|---|---|---|
no-inject | 1.96875 | 4.28125 | 6.25 |
injectSQL | 1.171875 | 2.546875 | 3.71875 |
Reduction | 40.48% | 40.51% | 40.50% |
Okay, so this suggests that there is still a massive, if slightly smaller, performance gain from directly using SQL over ActiveRecord. Hypothesis still supported.
Top comments (0)