What is an inner join and when should it be used?
An inner join lets you combine two tables as long as they have matching values for a common attribute. It is useful when you want to query a table based on matching entries from a related table.
You've probably seen a Venn diagram similar to the cover image of this post and that illustrates this combination, where each circle is a table and their intersection represents the common attributes.
But what does this mean in practice? And how can we take advantage of the joined tables?
Let's use the following domain example to better illustrate this:
class Payment < ApplicationRecord
belongs_to :booking
end
class Booking < ApplicationRecord
has_many :payments
enum status: %w[requested confirmed canceled]
end
Consider also that we have 3 payment entries, as well as 4 booking entries, in our database:
Payment.all
Id | Amount | Booking Id |
---|---|---|
1 | 450 | 1 |
2 | 100 | 2 |
3 | 770 | 3 |
Booking.all
Id | Status | Check In | Check Out |
---|---|---|---|
1 | 'requested' | 20 Jun 2021 | 23 Jun 2021 |
2 | 'requested' | 07 Aug 2021 | 08 Aug 2021 |
3 | 'confirmed' | 22 May 2021 | 28 May 2021 |
4 | 'canceled' | 01 Apr 2021 | 02 Apr 2021 |
Inner Joins in belongs_to
associations
Find all the payments of requested bookings
Let's say that we need to get all payments related to bookings with the status 'requested'.
We could solve this the ruby way, like:
Payment.all.select { |payment| payment.booking.requested? }
This works and reads well, but it is not optimal. Look at the queries triggered:
SELECT "payments".* FROM "payments"
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2 [["id", 1], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2 [["id", 2], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2 [["id", 3], ["LIMIT", 1]]
Sounds familiar? This is the famous n + 1
problem. We are querying payments once and then for each payment we query the related booking (passing the payment's booking_id
to the where clause).
There's a more efficient way of getting the same list of payments running only a single query, and this is where the ActiveRecord method joins
comes in.
The ActiveRecord joins
method
The ActiveRecord joins
method defaults to an inner join unless we customize it passing an SQL string to it (more on this later).
So joining payments with bookings,
Payment.joins(:booking)
produces the following SQL:
SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id"
This statement says that we are selecting all the fields of the payments table - "payments".*
- and joining booking entries - INNER JOIN "bookings"
- that have an id
that matches a payment's booking_id
- ON "bookings"."id" = "payments"."booking_id"
.
The joined table
When payments and bookings are inner joined, this is what happens:
payments.id | payments.amount | payments.booking_id | bookings.id | bookings.status | bookings.check_in | bookings.check_out |
---|---|---|---|---|---|---|
1 | 450 | 1 | 1 | 'requested' | 20 Jun 2021 | 23 Jun 2021 |
2 | 100 | 2 | 2 | 'requested' | 07 Aug 2021 | 08 Aug 2021 |
3 | 770 | 3 | 3 | 'confirmed' | 22 May 2021 | 28 May 2021 |
But note that, although the bookings' columns are now available to query, at the moment we are only selecting data from payments (SELECT "payments".*
).
So when we actually run the query, this is what is returned:
Id | Amount | Booking Id |
---|---|---|
1 | 450 | 1 |
2 | 100 | 2 |
3 | 770 | 3 |
It returns all the payments with a matching booking. Since all payments must belong to a booking, we get all Payment entries, a similar outcome to Payment.all
, except that now we have booking data to query against.
So now we can add a where clause to the query, filtering payments by booking status:
Payment.joins(:booking).where(bookings: { status: 'requested' })
Syntax notes: the joins
method uses the name of the association defined on the model. Since a payment belongs_to
a booking, the :booking
symbol is in the singular form. On the other hand, the where
method uses the name of the table that we want to apply the filter to, in this case, that's bookings
. Since we are joining two tables we need to tell ActiveRecord (and SQL) which table to apply the where clause to.
Produced SQL:
SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id" WHERE "bookings"."status" = 0"
This returns our desired output, the two payments related to 'requested bookings':
Payments
Id | Amount | Booking Id |
---|---|---|
1 | 450 | 1 |
2 | 100 | 2 |
Inner Joins in has_many
associations
So far, we have been looking at the simplest kind of association, the belongs_to
.
Let's now add a new model Guest
that has a has_many
relationship with bookings. A guest can have many bookings and a booking belongs to a guest:
class Guest < ApplicationRecord
has_many :bookings
end
class Booking < ApplicationRecord
belongs_to :guest
end
Guest.all
Id | Name |
---|---|
1 | 'John Smith' |
2 | 'Jane Williams' |
3 | 'Dory Timothy' |
4 | 'Yong Bergman' |
5 | 'Willie Connelly' |
Booking.all
Id | Guest Id | Check In | Check Out |
---|---|---|---|
1 | 3 | 20 Jun 2021 | 23 Jun 2021 |
2 | 5 | 07 Aug 2021 | 08 Aug 2021 |
3 | 2 | 22 May 2021 | 28 May 2021 |
4 | 3 | 01 Apr 2021 | 02 Apr 2021 |
Find all guests with bookings
Similarly to the previous belongs_to
exercise, we can inner join guests with bookings:
Guest.joins(:bookings)
Syntax notes: Contrarily to what happened in the belongs_to
exercise, the :bookings
symbol passed to joins
is now plural, since a guest can have many bookings.
Produced SQL:
SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"
The joined table
guests.id | guests.name | bookings.id | bookings.guest_id | bookings.check_in | bookings.check_out |
---|---|---|---|---|---|
2 | 'Jane Williams' | 3 | 2 | 22 May 2021 | 28 May 2021 |
3 | 'Dory Timothy' | 1 | 3 | 20 Jun 2021 | 23 Jun 2021 |
3 | 'Dory Timothy' | 4 | 3 | 01 Apr 2021 | 02 Apr 2021 |
5 | 'Willie Connelly' | 2 | 5 | 07 Aug 2021 | 08 Aug 2021 |
The returned table
Id | Name |
---|---|
2 | 'Jane Williams' |
3 | 'Dory Timothy' |
3 | 'Dory Timothy' |
5 | 'Willie Connelly' |
Note that John Smith
is not included in this list since this guest has no bookings. Remember that inner joins
only returns matching entries. If there is no match on the joined table, then that entry will be omitted.
Note also that 'Dory Timothy' is showing up twice. That's because there are two bookings requested by this guest. This outcome is very common when joining a has_many
association. The return table will return the same guest n
times its bookings. If Jane Williams
had 10 bookings, the returned table would include 10 Jane Williams
entries.
If we'd like to return only unique guests, we can use distinct
:
Guest.joins(:bookings).distinct
Produced SQL:
SELECT DISTINCT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"
The returned table is now
Id | Name |
---|---|
2 | 'Jane Williams' |
3 | 'Dory Timothy' |
5 | 'Willie Connelly' |
Inner Joins in has_many through
associations
Find all the guests with pending payments
Let's bring back the Payment
model to our domain, this time with a pending
or confirmed
status.
class Guest < ApplicationRecord
has_many :bookings
has_many :payments, through: :bookings
end
class Booking < ApplicationRecord
has_many :payments
end
class Payment < ApplicationRecord
belongs_to :booking
emun: %w[pending confirmed]
end
Guest.all
Id | Name |
---|---|
1 | 'John Smith' |
2 | 'Jane Williams' |
3 | 'Dory Timothy' |
4 | 'Yong Bergman' |
5 | 'Willie Connelly' |
Booking.all
Id | Guest Id | Check In | Check Out |
---|---|---|---|
1 | 3 | 20 Jun 2021 | 23 Jun 2021 |
2 | 5 | 07 Aug 2021 | 08 Aug 2021 |
3 | 2 | 22 May 2021 | 28 May 2021 |
4 | 3 | 01 Apr 2021 | 02 Apr 2021 |
Payment.all
Id | Amount | Booking Id | Status |
---|---|---|---|
1 | 450 | 1 | 'paid' |
2 | 100 | 2 | 'pending' |
3 | 770 | 3 | 'pending' |
4 | 810 | 4 | 'paid' |
To find the guests with pending payments, we need to go through the chain of associations: Guest -> Booking -> Payment
. This is called a nested join.
Luckily, the ActiveRecord joins
method allows us to pass nested associations:
Guest.joins(bookings: :payment)
So what we're saying is: 'Return all guests that have bookings with payments'. This produces two inner joins:
SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests"."id" INNER JOIN "payments" ON "payments"."booking_id" = "bookings"."id"
And results in the following joined table:
guests.id | guests.name | bookings.id | bookings.guest_id | bookings.check_in | bookings.check_out | payments.id | payments.amount | payments.booking_id | payments.status |
---|---|---|---|---|---|---|---|---|---|
2 | 'Jane Williams' | 3 | 2 | 22 May 2021 | 28 May 2021 | 3 | 770 | 3 | 'pending' |
3 | 'Dory Timothy' | 1 | 3 | 20 Jun 2021 | 23 Jun 2021 | 1 | 450 | 1 | 'paid' |
3 | 'Dory Timothy' | 4 | 3 | 01 Apr 2021 | 02 Apr 2021 | 4 | 810 | 4 | 'paid' |
5 | 'Willie Connelly' | 2 | 5 | 07 Aug 2021 | 08 Aug 2021 | 2 | 100 | 2 | 'pending' |
The payments data is now available to query. So next, we filter by payment status using the where
method:
Guest.joins(bookings: :payment).where(payments: { status: 'pending' })
And the final returned outcome:
Guests
Id | Name |
---|---|
2 | 'Jane Williams' |
5 | 'Willie Connelly' |
Customizing joins with SQL strings
Instead of passing symbols that tell ActiveRecord which association to join (and that will be later translated to SQL), the joins
method also accepts a SQL string directly as an argument. This has the advantage of providing extra flexibility in building more complex queries.
ActiveRecord joins
in polymorphic associations
Adding a new polymorphic model Accommocation
to our domain:
class Accommodation < ApplicationRecord
has_many :bookings, as: :bookable
end
class Booking < ApplicationRecord
belongs_to :bookable, polymorphic: true
end
In polymorphic associations if you try joins
using symbols, you'll get an error:
Booking.joins(:bookable)
ActiveRecord::EagerLoadPolymorphicError (Cannot eagerly load the polymorphic association :bookable)
That is because we need to tell joins
which bookable type we want to use since there can be many.
One of the ways we could solve this problem would be to pass an SQL string:
Bookings.joins("INNER JOIN accommodations ON accommodations.id = bookings.bookable_id AND bookings.bookable_type = 'Accommodation'")
Until rails 5 was released, if you wanted to run a left outer join, you'd also have to pass a custom SQL to the joins
method. Since then, Rails supports left outer joins through the ActiveRecord methods left_joins
and it's alias left_outer_joins
.
In part II of this series - Understanding ActiveRecord joins - I'll go through similar exercises to explain when and how we can use ActiveRecord's left_joins
.
Top comments (0)