DEV Community

Cover image for Caching data with DataLoader
TheGuildBot for The Guild

Posted on • Edited on • Originally published at the-guild.dev

Caching data with DataLoader

This article was published on Wednesday, January 26, 2022 by Gilad Tidhar @ The Guild Blog

Caching

Caching is an essential part of designing scalable and performant GraphQL API.

But, what is caching? Caching is the technical design of trying to avoid compute-intensive tasks by
keeping their results temporarily in a cache. A cache is a place in memory for important data, which
can be delivered to the client faster than the regular procedure.

For example, for a GraphQL API, we might want to cache a field value that comes from a slow external
API. The goal of caching is to improve performance where it's needed the most.

How Is Caching Faster?

Using a cache for compute-intensive data is faster for a few reasons:

  • A cache is usually not stored in storage, but in memory, which is a much faster.
  • A cache is smaller than a database, so there are a lot less values to filter through.
  • While some cached data might be queried a lot of times, it will be fetched from the database only one time, so there's less time lost for fetching multiple times.

How Do We Use Caching?

Caching with dataloader

My preferred platform for caching is dataloader, and in this tutorial I will explain how to use
dataloader with Node.js and Typescript, to have caching in your server.


Tip: dataloader outside of GraphQL

The dataloader package was created specifically for GraphQL, but it can work with other Node.js
programs.

Using DataLoader

1. Install DataLoader in Your Project

Install dataloader by running the following:

npm install --save dataloader or yarn add dataloader

2. Import dataloader

To start, import dataloader and LRU (LRU is a cache algorithm that will put only the most X
requested items in the cache).

import DataLoader from 'dataloader'
import { LRUMap } from 'lru_map'
Enter fullscreen mode Exit fullscreen mode

3. Define the Types That Would Be Saved to Our Cache

The first type should be a way to represent the data, like a string with a name, and the other is
the data itself. For example, let's say we're trying to cache books. Our package name will be the
books' id, so we can tell which book is which. Our package data will be the books' length, writer,
price etc...

so in this case, our schema would look like this:

type book {
  id: ID!
  name: String!
  author: author!
  price: Int!
  description: String!
  length: Int!
}
type author {
  id: ID!
  name: String!
}
Enter fullscreen mode Exit fullscreen mode

and the corresponding data type would look like this:

type packageId = string

type PackageData = /* note that this is the type of the data we represent. */ {
  id: string
  name: string
  author: {
    id: string
    name: string
  }
  price: number
  description: string
  length: number
}
Enter fullscreen mode Exit fullscreen mode

The reason we write types is for type safety, let's say we have books and toys which have different
values, we wouldn't want that somehow toys will be inserted into our books cache

4. Create the DataLoader Instance

const dataLoader = new DataLoader<packageId, PackageData>(
  async (/* (1) */ keys: readonly packageId[]) =>
    /* (2) */ Promise.all(
      keys.map(async packageId => {
        await getData(packageId).catch(e => e)
      })
    ),
  {
    cacheMap: new LRUMap(100) /* (3) */
  }
)
Enter fullscreen mode Exit fullscreen mode

As you can see, we give the keys that the user asks for to the dataloader (1).

I recommend setting the keys to be some kind of id for the following reason: Lets say that in your
api, the user asks for data using its id, so the user will fetch with id:"something", you could
just pass the id as the key, instead of changing it.

The dataloader will first check if it has the keys on cache, and if so, it would return the data,
without going through the database, saving you a lot of valuable time.

After that (2), we give the dataloader a function for fetching the data, in case it doesn't have it
in the cache. In this case, I have the function getData, and I'm using the keys to "get data" from
my database.

In the end (3) we give it cacheMap and some value, the value represents how many queries dataloader
will cache, in this case, after a 100 values, it will delete the least used value (the one that
wasn't queried for the longest time) to make space for the 101st value.

From now, to query data, you just run

dataLoader.load(keys)
Enter fullscreen mode Exit fullscreen mode

DataLoader with GraphQL

Dataloader was designed to work with GraphQL, to solve the
n+1 problem

The N+1 problem is a common problem when designing GraphQL API. Looking at the Query below, we can
see that the GraphQL API will call 20 times the Book.author resolver:

query BooksWithAuthors {
  books(first: 20) {
    id
    title
    author {
      id
      name
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Depending on the resolver implementation, this query might trigger 20 SQL queries or API calls to
resolve authors that might've wrote multiple books. Dataloader helps to solve this problem by
caching, deferring and grouping similar resolver calls.

How to Use the DataLoader Package with GraphQL?

To use the dataloader with GraphQL just pass it in the context!

Now, your code should look somewhat like this:

import DataLoader from "dataloader";
import { LRUMap } from "lru_map";

type packageId = string;
type PackageData = Package;

export type GraphQLContext = {
  dataLoader: DataLoader<packageId, PackageData>;
  // ...
};
const dataLoader = new DataLoader<packageId, PackageData>(
  async (keys: readonly PackageId[]) => {
    return await Promise.all(
      keys.map(async (packageId) => {
        await getData(packageId).catch((e) => e);
      })
    );
  },
  {
    cacheMap: new LRUMap(100),
  }
);
// ...
export async function contextFactory() {
  return { dataLoader, ... };
}
Enter fullscreen mode Exit fullscreen mode

Now, in your resolvers, just call context.dataLoader.load(keys) and that's it! You now have
caching in your server!

An example of a resolver implementation:

export const resolvers = {
  Query: {
    getBook(parent, input, context) {
      return context.dataLoader.load(input.bookId)
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

for further learning, check these out:

The n+1 problem
The dataloader github page
A great video about dataloader

Top comments (0)