Today I was working on performance optimisations for import logic. One place was operating on an array of arrays of IDs like so:
id_groups = [
[1, 2],
[42, 43],
...
]
id_groups.each do |ids|
ids.each_slice(5000) do |portion|
SomeModel.where(id: portion).update_all(some_field => some_value)
end
end
While doing an .update_all
in reasonable batches is a good start, there's a slight inefficiency here in that if the groups are many and small, we'll be making an update query for each group changing only a handful of records.
We can cut down on the number of update queries made, and thus save on the DB roundtrip overhead by utilising a common pattern - buffered bulk updates.
We will be buffering (collecting in some array) the updates we want to make, and performing an update query (with .upsert_all
) only when there's a good number of changes collected.
update_buffer = []
batch_size = 1000
# Collect updates into the buffer and execute upsert_all when reaching batch_size
id_groups.each do |ids|
ids.each do |id|
# Add each state update to the buffer
update_buffer << { id: id, some_field => some_value }
# Check if buffer has reached the batch size, and perform upsert_all if so
if update_buffer.size >= batch_size
SomeModel.upsert_all(update_buffer, unique_by: :id, returning: false)
update_buffer.clear # Clear buffer after upsert
end
end
end
# Perform final upsert_all for any remaining updates in the buffer
SomeModel.upsert_all(update_buffer, unique_by: :id, returning: false) unless update_buffer.empty?
An additional tweak that could be made is avoiding the need to recount the fullness of the buffer for every iteration (the update_buffer.size
part) by introducing some counter, incrementing it, and resetting it alongside upsert call. This is useful if the buffer size in your case is large.
Let me know what you think of this technique, have you used something similar?
Top comments (0)