When working on large-scale projects, quickly creating test data or dummy data can be crucial. In this article, we will explore different methods to efficiently create 100,000 records in Ruby on Rails.
Data Set Overview
For today's benchmark, we will start with the Postgres database used in my previous article:
# db/schema.rb
create_table "accounts", force: :cascade do |t|
t.string "first_name"
t.string "last_name"
t.string "phone"
t.string "email"
t.string "role"
end
By the way, if you haven't read my previous article, I invite you to do so.
To thoroughly test the efficiency of the methods we will discuss, I will generate two variables upfront that we will use:
accounts = FactoryBot.build_list(:account, 100_000)
accounts_attributes = accounts.map(&:attributes)
Thanks to FactoryBot, we have 100,000 unpersisted ActiveRecord objects in the accounts
variable and their attributes in the form of hashes in the accounts_attributes
variable.
Finally, for each of our tests, we will use a method and make some variations to try to push the performance of Ruby on Rails to the maximum.
1. Using .save
One of the simplest methods to create a record is to use the .save
method. For a large number of records, you can iterate over the ActiveRecord objects and call .save
for each instance. However, this can be quite slow for a large number of records.
accounts.each do |account|
account.save
end
This approach can be slow because it executes an SQL query with each .save
call, which can result in unsatisfactory performance.
The first variant of the .save
method that I would like to test is .save!
. In theory, the performance of .save!
should be equivalent to .save
. These are the same methods; it's just that .save!
will raise an exception if an error occurs.
accounts.each do |account|
account.save!
end
The last variant I would like to test for .save
involves a single SQL transaction. In our two previous examples, ActiveRecord will generate 100,000 transactions with the database when it wants to write our records. So 100,000 times, it will open a connection to the database, send the data, and close the connection. However, this process takes a lot of time!
Account.transaction do
accounts.each do |account|
account.save
end
end
In this code example, we can perform a single transaction with the database that will send all our 100,000 records. This is much faster!
2. Using .create
The .create
method is very similar to the .save
method. The only difference is that .create
belongs to the ActiveRecord
model. So, in our case, it belongs to Account
. While the .save
method belongs to an instance of our model, which is Account.new
.
accounts_attributes.each do |account_attributes|
Account.create(account_attributes)
end
Normally, .create
and .save
should have the same performance.
The second variant I would like to test is using hashes to create records. According to the documentation, if you pass an array of hashes to .create
, you can create multiple records at once. Let's see together if this is faster!
Account.create(accounts_attributes)
To be honest, I think this variant will be as slow as .create
on its own. According to the source code of .create
, when you pass it an array of hashes, .create
will simply iterate over the hashes and call itself to persist the data.
While we're at it, let's test the efficiency of .create!
as we did for .save
and .save!
.
accounts_attributes.each do |account_attributes|
Account.create!(account_attributes)
end
The last variant I'd like to test involves a single transaction. The same discussion as for .save
applies here; we will test what happens when we perform only a single SQL transaction.
Account.transaction do
accounts_attributes.each do |account_attributes|
Account.create!(account_attributes)
end
end
3. Using .insert_all
Ruby on Rails provides a method called .insert_all
that allows you to insert multiple records in a single SQL query.
Account.insert_all(accounts_attributes)
We will see in the performance test, but .insert_all
will be very fast.
This speed is made possible by the absence of ActiveRecord validations and callbacks, which significantly speeds up the process.
4. Using .upsert_all
In the same vein as .insert_all
, Ruby on Rails provides a method called .upsert_all
, which allows you to create or update a record if it already exists in the database.
.upsert_all
is very convenient for bulk updates.
Account.upsert_all(accounts_attributes)
This method follows the same logic as insert_all
; it does not invoke ActiveRecord validations and callbacks, greatly enhancing performance.
5. Using ActiveRecord-Import
For optimal performance, there is a gem called activerecord-import
that adds a magical function, .import
.
bundle add activerecord-import
Once the gem is installed, you can use it as follows:
Account.import(accounts_attributes)
.import
is, in my opinion, the best approach because activerecord-import
optimizes performance by minimizing network overhead and allowing block processing. It also handles ActiveRecord validations. So, compared to .insert_all
and .upsert_all
, it is compatible with the databases supported by Rails, no matter your system; it integrates perfectly.
Benchmark
Now that we know all these methods, let's find out which one is the fastest!
You can find the benchmark here.
Here is the time it takes to create 100,000 records. Place your bets!
Performance Benchmark
Label | User | System | Total | Real |
---|---|---|---|---|
.save | 98.482027 | 7.339702 | 105.821729 | 174.001099 |
.save! | 82.036858 | 7.221731 | 89.258589 | 145.422204 |
.save! with transaction | 38.410147 | 2.444573 | 40.854720 | 68.257510 |
.create | 105.934837 | 7.278972 | 113.213809 | 185.792927 |
.create with hashes | 118.748599 | 8.459169 | 127.207768 | 204.100991 |
.create! | 121.161595 | 7.396354 | 128.557949 | 203.611114 |
.create! with transaction | 48.788214 | 2.510467 | 51.298681 | 79.932584 |
.insert_all | 1.450411 | 0.143563 | 1.593974 | 3.064136 |
.upsert_all | 1.442954 | 0.116461 | 1.559415 | 2.935700 |
activerecord-import | 3.511353 | 0.082371 | 3.593724 | 4.778761 |
🥇 Upsert All (69 times faster than .create
with hashes)
🥈 Insert All (66 times faster than .create
with hashes)
🥉 ActiveRecord-Import (42 times faster than .create
with hashes)
I find it very interesting that, overall, .save
is slightly faster than .create
.
Interpretation
-
.save & .save! : The
.save
and.save!
methods are very slow, taking about 2 to 3 minutes to process 100,000 records. This is mainly because they perform validations and saves one by one, resulting in frequent database calls. -
.create & .create! : The
.create
and.create!
methods are slightly slower than.save
. -
.save! with transaction & .create! with transaction : Using a transaction significantly improves performance compared to
.save
and.create
, taking about 1 minute to process the same amount of data. So, it is important to use a single transaction when dealing with a large amount of data. -
.create with hashes : This method is slightly slower than
.create
, taking about 3 minutes and 24 seconds. It is the slowest of all in the benchmark. Avoid it in all situations! -
.insert_all & .upsert_all : The
.insert_all
and.upsert_all
methods are much faster than the previous methods. They take about 3 to 4 seconds to process 100,000 records. These methods use batch SQL queries to insert or update data. It is crucial to note that validations and callbacks are not invoked with these methods. - ActiveRecord-Import : The ActiveRecord-Import method is also very performant, taking about 4.8 seconds to process 100,000 records. It is essential to note that validations are taken in account when running insert with activerecord-import.
Recommendations
- For creating a large number of records, I will prefer
insert_all
andupsert_all
if validations are not necessary. - If validations are necessary, I will go for
activerecord-import
. - The use of
.save
and.create
should be avoided for a large dataset. However, for a small dataset, and if it is enclosed in a transaction block, the performance loss can be minimized.
Conclusion
Creating 100,000 records in Ruby on Rails can be a challenge, but with the right methods, you can do it in a matter of seconds.
For quickly creating a large number of records, I strongly recommend using methods such as .insert_all
, .upsert_all
, or ActiveRecord-Import. Using transactions can also improve the performance of methods like .save!
and .create!
. For optimal performance, it is crucial to choose the method that suits your needs and avoid slower, sequential methods such as .save
and .create
.
Learn More
- Repository used in the benchmark: https://github.com/just-the-v/fastest-way-to-insert-in-rails
- ActiveRecord-Import: https://github.com/zdennis/activerecord-import
- insert_all Documentation: https://apidock.com/rails/v6.0.0/ActiveRecord/Persistence/ClassMethods/insert_all
Top comments (5)
If you really want fast imports into PostgreSQL then
copy
is your friend. There is a gem called activerecord-copy that should be useful. It again does not go through validations or callbacks but could be up to 3-4 times as fast as .insert_all / .upsert_all.Hi Ben! Thanks for your feedback.
You're completely right, I already used this alternative, but I wanted to do a separate article for this one as the performance are very above the other alternatives.
thank you.
Interpretation point 6 is missing some words
Hey!
Thanks, I have updated the article with the missing content!