DEV Community

loading...
Cover image for Understanding ActiveRecord inner joins

Understanding ActiveRecord inner joins

Ana Nunes da Silva
I'm a web developer working mainly on Rails backends. I love programming but also books, films and strolling around in my hometown, Lisbon.
Originally published at ananunesdasilva.com Updated on ・7 min read

What is an inner join and when should it be used?

An inner join is used to query a table based on matching entries from a related table.

But what does this mean in practice?

Let's use the following domain example to better illustrate this:

class Payment < ApplicationRecord
  belongs_to :booking
end
Enter fullscreen mode Exit fullscreen mode
class Booking < ApplicationRecord
  has_many :payments

  enum status: %w[requested confirmed canceled checked_in checked_out]
end
Enter fullscreen mode Exit fullscreen mode

Consider also that we have 3 payment entries on our database and 4 booking entries.

Payment.all
Enter fullscreen mode Exit fullscreen mode
Id Amount Booking Id
1 450 1
2 100 2
3 770 3
Booking.all
Enter fullscreen mode Exit fullscreen mode
Id Status Check In Check Out
1 'requested' 20 Jun 2021 23 Jun 2021
2 'requested' 07 Aug 2021 08 Aug 2021
3 'confirmed' 22 May 2021 28 May 2021
4 'canceled' 01 Apr 2021 02 Apr 2021

Inner Joins in belongs_to associations

Find all the payments of requested bookings

Let's say that we need to get all payments related to bookings with the status 'requested'.

We could solve this the ruby way, like:

Payment.all.select { |payment| payment.booking.requested? }
Enter fullscreen mode Exit fullscreen mode

This works and reads well, but it is not optimal. Look at the queries triggered:

SELECT "payments".* FROM "payments"
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
SELECT "bookings".* FROM "bookings" WHERE "bookings"."id" = $1 
Enter fullscreen mode Exit fullscreen mode

Sounds familiar? This is the famous n + 1 problem. We are querying payments once and then for each payment we query the related booking (using the payment's booking_id in the where clause).

There's a more efficient way of doing the same on a single query, this is where joins comes in.

The ActiveRecord joins method

The ActiveRecord joins method defaults to an inner join unless we customize it passing an SQL string to it (more on this later).

So joining with payments with bookings,

Payment.joins(:booking)
Enter fullscreen mode Exit fullscreen mode

produces the following SQL:

SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id"
Enter fullscreen mode Exit fullscreen mode

This statement says that we are selecting all the fields of the payments table - "payments".* - and joining the booking entries - INNER JOIN "bookings" - whose id match a payment's booking_id - ON "bookings"."id" = "payments"."booking_id".

The joined table

When payments and bookings are inner joined, this is what happens:

payments.id payments.amount Booking Id bookings.id bookings.status bookings.check_in bookings.check_out
1 450 1 1 'requested' 20 Jun 2021 23 Jun 2021
2 100 2 2 'requested' 07 Aug 2021 08 Aug 2021
3 770 3 3 'confirmed' 22 May 2021 28 May 2021

Note that: The columns of the bookings table are now available in the database to query against, but they will not be sent back to the application and built into ActiveRecord objects.

So when we actually run the query, this is what ActiveRecord returns.

Payments

Id Amount Booking Id
1 450 1
2 100 2
3 770 3

It returns all the payments with a matching booking. Since all payments must belong to a booking, we get all Payment entries, a similar outcome to Payment.all, except that now we have booking data to query against.

So now we can add a where clause to the query, filtering payments by booking status:

Payment.joins(:booking).where(bookings: { status: 'requested' })
Enter fullscreen mode Exit fullscreen mode

Syntax notes: the joins method uses the name of the association defined on the model. Since a payment belongs_to a booking, the :booking symbol is in singular form. On the other hand, the where method uses the name of the table that we want to apply the filter to, in this case, that's bookings. Since we are joining two tables we need to tell ActiveRecord (and SQL) which table to apply the where clause to.

Produced SQL:

SELECT "payments".* FROM "payments" INNER JOIN "bookings" ON "bookings"."id" = "payments"."booking_id" WHERE "bookings"."status" = 0"
Enter fullscreen mode Exit fullscreen mode

This returns our desired output, the two payments related to 'requested bookings':

Payments

Id Amount Booking Id
1 450 1
2 100 2

Inner Joins in has_many associations

So far, we have been looking at the simplest kind of association, the belongs_to.

Let's now add a new model Guest that has a has_many relationship with bookings. A guest can have many bookings and a booking belongs to a guest:

class Guest < ApplicationRecord
  has_many :bookings
end
Enter fullscreen mode Exit fullscreen mode
class Booking < ApplicationRecord
  belongs_to :guest
end
Enter fullscreen mode Exit fullscreen mode
Guest.all
Enter fullscreen mode Exit fullscreen mode
Id Name
1 'John Smith'
2 'Jane Williams'
3 'Dory Timothy'
4 'Yong Bergman'
5 'Willie Connelly'
Booking.all
Enter fullscreen mode Exit fullscreen mode
Id Guest Id Check In Check Out
1 3 20 Jun 2021 23 Jun 2021
2 5 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 3 01 Apr 2021 02 Apr 2021

Find all guests with bookings

Similarly to the previous belongs_to exercise, we can inner join guests with bookings:

Guest.joins(:bookings)
Enter fullscreen mode Exit fullscreen mode

Syntax notes: Contrarily to what happened in the belongs_to exercise, the :bookings symbol passed to joins is now plural, since a guest can have many bookings.

Produced SQL:

SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"
Enter fullscreen mode Exit fullscreen mode

The joined table

guests.id guests.name bookings.id bookings.guest_id bookings.check_in bookings.check_out
2 'Jane Williams' 3 2 22 May 2021 28 May 2021
3 'Dory Timothy' 1 3 20 Jun 2021 23 Jun 2021
3 'Dory Timothy' 4 3 01 Apr 2021 02 Apr 2021
5 'Willie Connelly' 2 5 07 Aug 2021 08 Aug 2021

The returned table

Id Name
2 'Jane Williams'
3 'Dory Timothy'
3 'Dory Timothy'
5 'Willie Connelly'

Note that John Smith is not included in this list since this guest has no bookings. Remember that inner joins only returns matching entries. If there is no match on the joined table, then that entry will be omitted.

Note also that 'Dory Timothy' is showing up twice. That's because there are two bookings requested by this guest. This outcome is very common when joining a has_many association. The return table will return the same guest n times its bookings. If Jane Williams had 10 bookings, the returned table would include 10 Jane Williams entries.

If we'd like to return only unique guests, we can use distinct:

Guest.joins(:bookings).distinct
Enter fullscreen mode Exit fullscreen mode

Produced SQL:

SELECT DISTINCT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests.id"
Enter fullscreen mode Exit fullscreen mode

The returned table is now

Id Name
2 'Jane Williams'
3 'Dory Timothy'
5 'Willie Connelly'

Inner Joins in has_many through associations

Find all the guests with pending payments

Let's bring back the Payment model to our domain, this time with a pending or confirmed status.

class Guest < ApplicationRecord
  has_many :bookings
  has_many :payments, through: :bookings
end
Enter fullscreen mode Exit fullscreen mode
class Booking < ApplicationRecord
  has_many :payments
end
Enter fullscreen mode Exit fullscreen mode
class Payment < ApplicationRecord
  belongs_to :booking

  emun: %w[pending confirmed]
end
Enter fullscreen mode Exit fullscreen mode
Guest.all
Enter fullscreen mode Exit fullscreen mode
Id Name
1 'John Smith'
2 'Jane Williams'
3 'Dory Timothy'
4 'Yong Bergman'
5 'Willie Connelly'
Booking.all
Enter fullscreen mode Exit fullscreen mode
Id Guest Id Check In Check Out
1 3 20 Jun 2021 23 Jun 2021
2 5 07 Aug 2021 08 Aug 2021
3 2 22 May 2021 28 May 2021
4 3 01 Apr 2021 02 Apr 2021
Payment.all
Enter fullscreen mode Exit fullscreen mode
Id Amount Booking Id Status
1 450 1 'paid'
2 100 2 'pending'
3 770 3 'pending'
4 810 4 'paid'

To find the guests with pending payments, we need to go through the chain of associations: Guest -> Booking -> Payment. This is called a nested join.

Luckily, the ActiveRecord joins method allows us to pass nested associations:

Guest.joins(bookings: :payment)
Enter fullscreen mode Exit fullscreen mode

So what we're saying is: 'Return all guests that have bookings with payments'. This produces two inner joins:

SELECT "guests".* FROM "guests" INNER JOIN "bookings" ON "bookings"."guest_id" = "guests"."id" INNER JOIN "payments" ON "payments"."booking_id" = "bookings"."id"
Enter fullscreen mode Exit fullscreen mode

And results in the following joined table:

guests.id guests.name bookings.id bookings.guest_id bookings.check_in bookings.check_out payments.id payments.amount payments.booking_id payments.status
2 'Jane Williams' 3 2 22 May 2021 28 May 2021 3 770 3 'pending'
3 'Dory Timothy' 1 3 20 Jun 2021 23 Jun 2021 1 450 1 'paid'
3 'Dory Timothy' 4 3 01 Apr 2021 02 Apr 2021 4 810 4 'paid'
5 'Willie Connelly' 2 5 07 Aug 2021 08 Aug 2021 2 100 2 'pending'

The payments data is now available to query so next, we filter by payment status using the where method:

Guest.joins(bookings: :payment).where(payments: { status: 'pending' })
Enter fullscreen mode Exit fullscreen mode

And the final returned outcome:

Guests

Id Name
2 'Jane Williams'
5 'Willie Connelly'

Customizing joins with SQL strings

Instead of passing symbols that tell ActiveRecord which association to join (and that will be later translated to SQL), the joins method also accepts a SQL string directly as an argument. This has the advantage of providing extra flexibility in building more complex queries.

ActiveRecord joins in polymorphic associations

Adding a new polymorphic model Accommocation to our domain:

class Accommodation < ApplicationRecord
  has_many :bookings, as: :bookable
end
Enter fullscreen mode Exit fullscreen mode
class Booking < ApplicationRecord
  belongs_to :bookable, polymorphic: true
end
Enter fullscreen mode Exit fullscreen mode

In polymorphic associations if you try joins using symbols, you'll get an error:

Booking.joins(:bookable)

ActiveRecord::EagerLoadPolymorphicError (Cannot eagerly load the polymorphic association :bookable)
Enter fullscreen mode Exit fullscreen mode

That is because we need to tell joins which bookable type we want to use since there can be many.

One of the ways we could solve this problem would be to pass an SQL string:

Bookings.joins("INNER JOIN accommodations ON accommodations.id = bookings.bookable_id AND bookings.bookable_type = 'Accommodation'")
Enter fullscreen mode Exit fullscreen mode

Until rails 5 was released, if you wanted to run a left outer join, you'd also have to pass a custom SQL to the joins method. Since then, rails supports left outer joins through the ActiveRecord methods left_joins and its alias left_outer_joins.

In part II of this series - Understanding ActiveRecord joins - I'll go through similar exercises to explain when and how we can use ActiveRecord's left_joins.

Discussion (0)