Jesal Gadhia

Posted on Aug 19, 2020 • Edited on Nov 3, 2020 • Originally published at jes.al

Querying paginated API endpoints the Rails way

#ruby #rails #api

Recently I was working on querying an API that didn't have a Ruby SDK. So I had the opportunity to write a thin service wrapper for it. This API allowed requesting a specific page and the total number of records to include per page.

So say for example, the API returned a list of people. Here's what that would look like:

{
  "meta": {
    "page": { "total": 200 }
  },
  "items": [
    {
      "type": "people",
      // ...
    },
    {
      "type": "people",
      // ...
    }
  ]
}

First pass

I made a first attempt at writing a PeopleService PORO that is responsible for querying those records:

class PeopleService 
  def where(params = {})
    default_params = { page: 1, count: 25 }
    params = default_params.merge(params)

    make_request(params)
  end

  private

  def make_request(params)
    # Make external API call using the params
  end
end

That would do the job if the caller writes their our own iteration logic. For instance, if we want to retrieve all the People from the API:

page_count = 0
people = []
loop do
  page_count += 1
  result = PeopleService.new.where(page: page_count)
  people << result[:items]
  break if page_count = result.dig(:meta, :page, :total).to_i
end

Although that feels a bit messy. The API contract is leaking out of the PeopleService abstraction layer that we just created.

Let's make it Rails-y

I want my service object to follow more Rails like conventions. So in other words, I'd like to be able to iterate over the results from the PeopleService in an ActiveRecord like syntax. For example: PeopleService.new.all.each{ |person| #some operation }

Enumeration

So to achieve that, we will have to make use Ruby's Enumerator object:

class PeopleService 
  def initialize
    # Setup API auth params
  end

  def where(params = {})
    default_params = { page: 1, count: 25 }
    params = default_params.merge(params)

    make_request(params)
  end

  def all(params = {})
    Enumerator.new do |yielder|
      page = 1
      loop do
        result = where(params.merge(page: key))
        result[:items].map { |item| yielder << item }

        # This loop knows how to rescues from this exception and will treat it as a break       
        raise StopIteration if page >= result.dig(:meta, :page, :total).to_i

        page += 1
      end
    end.lazy
  end

  private

  def make_request(params)
    # Make external API call using the params
  end
end

That will get us closer to what we are looking for. Using the Enumerator object above is what will give us the ability to iterate over the results returned from the all method. It will instantiate and return an Enumerable object.

That will unlock a powerful ability to chain a number of enumerators together and perform block operations on them which will make our service highly composable.

So for example, if we wanted to group the people by their location we could chain the results with a group_by function:

PeopleService.new.all.group_by{ |person| person.location }

Lastly, you might have noticed we tacked in a .lazy at the end of the enumerable instance. What that does is makes it into an instance of Enumerator::Lazy and only returns the results that we specifically enumerate over.

So say if this API had 1,000 pages of results. Without the lazy enumerator, PeopleService.new.all would actually query all of those 1,000 pages as soon as you call it. That would be extremely slow and resource-intensive. In most cases, we might even hit a rate limit set by the API provider. What we rather want is that it only query the pages that we actually enumerate over.

So for example, if we are trying to find the person object with a specific email, it will stop querying the API as soon as it finds a page that contains Jon Doe:

PeopleService.new.all.find { |person| person.email = 'jon.doe@example.com' }

Caching

Right now calling the all method again, despite it being a lazy enumerator will query the pages it has already queried. So for example:

ps = PeopleService.new

# This will iterate through the pages until we find Jon Doe
person = ps.all.find { |person| person.email = 'jon.doe@example.com' }

# Calling this again **should not** query the same pages again. We should already store the results.
person = ps.all.find { |person| person.email = 'jon.doe@example.com' }

Similar to ActiveRecord's query cache, we also want to cache the results from our query for performance. This is where one of the most underrated features of the Hash class comes into play.

If you instantiate a Hash with a block, it will use that block to calculate the value of the key. In our case, we can tell the hash to call the API to fetch the results of the page we are looking for.

The beauty of this feature is that it will only call the block once per key. So if the key has already been assigned a value, it will not call it again:

h = Hash.new do |h, key|
  h[key] = where(page: key)
end

h[1]
# Fetches results for page 1 fromt he API
# => (500.0ms) [{...},{...},{...}]

# Next call to the same key is already assigned, the block isn't executed
h[1]
# => (Cached 0.0ms) [{...},{...},{...}]

When using the Hash approach in our class, we will also want to make sure that we use Memoization (using the ||= operator) to ensure that the Hash itself is cached in an instance variable called all_pages.

This will allow us to call the all method multiple times after the class has instantiated and ensure the results don't get overwritten:

class PeopleService 
...
def all(params = {})
  Enumerator.new do |yielder|
    page = 1
    loop do
      @all_pages ||= Hash.new do |h, key|
        h[key] = where(params.merge(page: key))
      end
      result = @all_pages[page]
      result[:items].map { |item| yielder << item }

      raise StopIteration if page >= result.dig(:meta, :page, :total).to_i

      page += 1
    end
  end.lazy
end
...
end

Final form

Here's what our finished product looks like after leveraging the key features of the Enumerator and Hash objects. Now our all method's interface will be very similar to the one provided by ActiveRecord

class PeopleService 
  def where(params = {})
    default_params = { page: 1, count: 25 }
    params = default_params.merge(params)

    make_request(params)
  end

  def all(params = {})
    Enumerator.new do |yielder|
      page = 1
      loop do
        @all_pages ||= Hash.new do |h, key|
          h[key] = where(params.merge(page: key))
        end
        result = @all_pages[page]
        result[:items].map { |item| yielder << item }

        raise StopIteration if page >= result.dig(:meta, :page, :total).to_i

        page += 1
      end
    end.lazy
  end

  private

  def make_request(params)
    # Make external API call using the params
  end
end

Usage:

  ps = PeopleService.new
  ps.all.each do |person|
     # some operation on the person object
  end

Let me know if that was useful. Would love to hear about any other techniques that you've found particularly interesting when querying external APIs.

This post was originally published on my blog. If you liked this post, please share it on social media and follow me on Twitter!

DEV Community

Querying paginated API endpoints the Rails way

Top comments (0)