ActiveRecord is a great ORM, with a ton of nifty little things that make your life easy when dealing with DB queries. pluck
is one such method. I've been using pluck
for as long as I can remember, and even my second oldest article is on it!
Even after using it for so long, recently I found myself debugging a relatively simple query that used pluck
and realized I missed some key points regarding this sweet little AR method.
The Rails Guides on pluck
is pretty neat, but there are still some things you might miss so I'll try to sum it all up here. If you're just starting out or might have overlooked the documentation, I highly recommend going through this. However, if you're already versed in AR/pluck, feel free to jump to the concluding sections.
What is pluck and how does it work?
Pluck
is an AR query method that selects one or more attributes(or columns) and returns an Array
, without loading the corresponding records. It uses select
internally and triggers a query for the values.
Customer.pluck(:id)
# SELECT customers.id FROM customers
=> [1,2,3]
# or
Customer.pluck(:id, :first_name)
# SELECT customers.id, customers.first_name FROM customers
=> [[1, "David"], [2, "Fran"], [3, "Jose"]]
There are multiple ways to do the above in Rails, pluck
is the most performant out of all in almost all cases.
Basically, you are replacing code that looks like this
Customer.select(:id).map { |c| c.id }
# or
Customer.select(:id, :first_name).map { |c| [c.id, c.first_name] }
to be cleaner and more efficient.
So why is it faster?
pluck
directly converts the data into a Ruby array skipping the construction of AR objects, thus giving it an edge in performance. Here's an article benchmarking pluck vs select vs collect/map.
There are some more points that are well-documented in the guides, so I'll just list them here.
- You are not limited to querying fields from a single table, you can query multiple tables as well.
Order.joins(:customer, :books).pluck("orders.created_at, customers.email, books.title")
-
pluck
triggers an immediate query, and thus cannot be chained with any further scopes, although it can work with scopes already constructed earlier:
Customer.pluck(:first_name).limit(1)
=> NoMethodError: undefined method `limit' for #<Array:0x007ff34d3ad6d8>
Customer.limit(1).pluck(:first_name)
=> ["David"]
-
pluck
will trigger eager loading if the relation object contains include values, even if the eager loading is not necessary for the query. As you can see in the example, we don't need the join for customer ids, butpluck
will still go ahead and do it! So be careful when you have eager-loaded associations!
irb> assoc = Customer.includes(:reviews)
irb> assoc.pluck(:id)
SELECT "customers"."id" FROM "customers" LEFT OUTER JOIN "reviews" ON "reviews"."id" = "customers"."review_id"
# to avoid this you can do
assoc.unscope(:includes).pluck(:id)
Now we have seen what's in the guides, let's see some cases that might trip you up!
Plucking jsonb fields in Rails
If you have a jsonb field in your AR object and you want some nested value you can still use pluck
!
user.metadata = {
"location": {
"city": "New York",
# ... additional data
}
# ... other metadata attributes
}
# Using 'pluck' to retrieve the 'city' values from the 'location' key in 'metadata'
cities = user.pluck("metadata -> 'location' ->> 'city'")
=> ["New York"]
Pluck's Overriding Power
Since pluck
uses select
internally. it will override any other selects you do before this. This seems obvious but when writing a complex query one might miss out on this. I'll try to illustrate with a relatively simple example
# Scenario: You want to fetch distinct cities from a user table
# Step 1: Using 'select' with 'distinct' to get unique cities
distinct_cities_query = User.select(:city).distinct
# Step 2: Attempting to use 'pluck' to retrieve the city names
cities = distinct_cities_query.pluck(:city)
# Cities might contain duplicate city names
Plucking with where
Another common issue can arise when using pluck
with where (remember, pluck fires an immediate query). Hence, it can fire some extra queries where it's not required. Thankfully there’s a PluckInWhere
Rubocop you can use to warn you.
# bad
# will trigger 2 queries
Post.where(user_id: User.active.pluck(:id))
# SELECT id FROM users WHERE /* conditions for active users */;
# SELECT * FROM posts WHERE user_id IN (1, 2, 3, ...); -- Array of IDs from the first query
# good
# triggers 1 query only
Post.where(user_id: User.active.select(:id))
# SELECT * FROM posts WHERE user_id IN (SELECT id FROM users WHERE /* conditions for active users */);
When to think twice before using pluck?
When you have already loaded the AR objects into memory there is no need to use
pluck
as it will trigger another query. Rather just usemap
orcollect
.When dealing with large datasets be careful with
pluck
as it will load everything into your memory.
# Instead of using pluck which can consume a lot of memory
# User.pluck(:email).each do |email|
# # Process email
# end
# Use find_each or find_in_batches for batch processing
User.find_each(batch_size: 1000) do |user|
email = user.email
# Process email
end
In conclusion, pluck is a versatile and powerful method in Rails for efficiently retrieving specific columns from a database, but it's crucial to understand its nuances and limitations.
Happy Plucking :D
Top comments (1)
Nice article !!
Pluck is really a game changer when you want to optimize your application!