Danny Adams

Posted on Mar 22, 2023

How to Speed Up your Applications by Caching at the Edge with HarperDB

#serverless #programming #webdev #javascript

Caching is a commonly used technique to speed up websites and applications. “Caching” simply means to save something (e.g. data or a web page) so that it can be accessed quickly in the future.

For example, WordPress websites heavily use caching to reduce the amount of server computing that has to be done. This reduces costs and allows web pages and blog posts to be served up quickly.

If a blog post wasn’t cached, then every time a page is requested, the post’s data (the title, excerpt, feature image, and content) would have to be fetched from the database and rendered into html. But if the page is cached (i.e. the resulting html is saved), then the page is just sitting on the server ready to go, and can be served up straight away without any database calls – speeding up the site.

In this article, we’re going to discuss a modern technique called "caching at the edge". We'll then create a simple caching project with HarperDB to cache the result of an API call. We'll then compare the speed of an un-cached api route vs a cached api route.

You'll see that caching the result of an api request will speed up subsequent requests, and also make it less likely that our app gets rate limited.

What is HarperDB and why is it a good option for caching?

HarperDB is a distributed application & database platform that allows you to lower latency and improve the performance of any dataset and any application.

“Distributed” means that the application and database are spread out around the globe. Traditionally, a web app and database sit on one server, but this causes a lot of latency (a fancy word for delay) if the user is located a long way from the server -- e.g., if the app is in New York, but the user is in Delhi.

If the app is distributed, then the app is sitting on multiple servers around the world, e.g. the user in Delhi could get the app served from a computer in Bangladesh, instead of New York.

What is Edge computing?

Traditionally, when you deploy a web app, you rent out a computer that lives in a physical data centre somewhere around the world – like “us-east-2” or “tt-west-4”. All the requests from around the world go to that same server. But the problem is that your users are scattered around the globe. The speed of light is fast, but not instant, so being physically closer to the server will give you a faster response from the server.

Web developers have been mitigating this problem for a long time by using CDNs (content delivery networks) to distribute, duplicate, or cache static files – such as HTML, CSS, or JavaScript on servers all around the world. This works great for static files, but doesn’t work for a dynamic server that needs to execute some server-side code on every request.

Wikipedia’s definition of edge computing:

“Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This is expected to improve response times and save bandwidth.”

So, edge computing is like a CDN for a full-blown server.

Vercel (a company that is famous for providing great Next.js hosting) provide edge functions that distribute your Next.js api routes around the globe, putting them closer to the user – wherever they are situated.

This sounds great on paper, but what if the database is sitting in a single location – say New York. If a user in Delhi makes a request to a distributed api route located in Bangladesh, but then that api route needs some data from the database located in New York, then the round trip becomes:

Dellhi -> Bangladesh -> New York -> Bangladesh -> New York.

That request takes even longer than if the app and data wasn’t distributed and just sat in New York, with a shorter round trip of:

Delhi -> New York -> Delhi.

So, if we are distributing our application, then (depending on your specific use case) it’s often going to be a good idea to distribute the data along with the app.

But distributed systems can be very complex to create and maintain. This is where HarperDB comes in. HarperDB lowers the latency of any dataset by distributing your api routes and data around the globe, putting your application (e.g. api routes) right next to your database, and your application and data closer to the end user.

This makes HarperDB a great candidate for edge caching: we can cache (save to our HarperDB database) data and distribute this cache all over the globe.

With HarperDB, the API server is integrated into the database itself (a feature known as “custom functions”), meaning that there is one less hop from the api to the database, further reducing latency and serving our users nice and fast.

For more on edge caching, check out Edge Caching Explained & Why You Should Be Using It.

Let’s now build a simple project to cache the result of an api request…

HarperDB caching project

What we’re going to build

We’re going to compare the speed difference of an un-cached api route vs a route that caches the result of an api call.

In the un-cached route, we will simply fetch a post by id from the JSONPlaceholder API, https://jsonplaceholder.typicode.com/posts/${id}, on every request, and return it to the user.

In the cached route, we will cache the result of each request to https://jsonplaceholder.typicode.com/posts/${id}.

The logic for the cached route is simple:

Get the post id from the url, and fetch the post from the HarperDB database.
If a post is found (i.e. cached), then return it. (We are done.)
If no post is found in the database with that id, then fetch the post from the API.
Save (cache) the post in the database.
Return the post.
Let’s get started.

Installing HarperDB locally

Install HarperDB locally for Mac, Windows, or Linux.

I’m on Mac, so to install HarperDB I opened a terminal and entered:

$ npm install -g harperdb

This installed HarperDB instance on my Mac is located at the destination: /Users/danadams/hdb Server with:

Listening port: 9925
Username for HDB_ADMIN: HDB_ADMIN
Password: whatever_you_set_this_to_during_installation

We can now start HarperDB with the command:
$ harperdb

Now we can use HarperDB locally!

Setting up HarperDB studio

First, create an account with HarperDB.

Then we need to connect up our locally installed HarperDB instance by registering a user-installed instance:

Select “Register User-Installed Instance”:

Then connect up the local HarperDB instance that you installed in the previous step:

Creating our schema

Let’s create a schema called “caching_project”. A schema is just a fancy way of saying “group of tables” in HarperDB.

Then, in the caching_project schema, create a table called single_post with a hash_attribute of id. A hash_attribute is kind of like a unique primary key for a row in a table.

Creating a custom function project

In HarperDB, custom functions are custom routes that we can define to do whatever we want – usually to interact with our HarperDB database in some way. Essentially, custom functions allow us to build our api right next to where our data is stored, reducing latency.

Custom Functions are powered by Fastify (a light-weight Node.js framework that claims to be faster than Express.js), so they’re extremely flexible.

To spin up a new custom functions project, go to the functions tab, click the ‘+’ icon next to ‘projects’ and create a project called caching-project:

Next, create a file in the ‘routes’ folder called ‘post’. This is where we will be writing our route handlers to fetch single blog posts.

HarperDB provides some example routes, but let’s clear everything out and create a simple route to test that we’re set up correctly:

'use strict';

module.exports = async (server, { hdbCore, logger }) => {
  // Test route
  server.route({
    url: '/test',
    method: 'GET',
    handler: () => {
      return 'It works!' // This should be printed out in browser
    }
  })
};

Locate the url to your custom functions at the bottom left corner of the ‘custom functions’ tab:

Visiting the caching-project/test route:

Perfection!

Note that we can also write our custom functions in your favourite text-editor by opening the project locally. On Mac, the custom functions are located at /Users/your_username/hdb/custom_functions, so to open with VS Code on Mac:

Open up a terminal
$ cd hdb
$ cd custom_functions
$ code .

Now, you can edit your code in your text editor, then go to the HarperDB studio functions tab, click reload to see your local changes, then click the green save button to update.

But for this simple project, I’m just gonna wing it and write the code directly into HarperDB studio.

Creating an un-cached route

Let’s create a route that fetches a post by its id from the JSONPlaceholder API, then returns it to the user:

  // Fetch a post by id from JSONPlaceholder api
  server.route({
    url: '/post/:id',
    method: 'GET',
    handler: () => {
      const postId = request.params.id;
      const response = await fetch(
        `https://jsonplaceholder.typicode.com/posts/${postId}`
      );
      const post = await response.json();
      return post;
    }
  })

Now let’s open up Postman (a popular software for quickly testing api routes) to hit this endpoint, and see how long it takes to receive the data:

After clicking “Send” a few times, the request takes anywhere from 35ms to 200ms, with most requests taking around 45ms. Let’s see if we can improve that by caching the results of the API calls…

Creating a cached route

Let's remind ourselves of the logic for the cached route:
Get the post id from the url, and fetch the post from the HarperDB database.

If a post is found, then return it. (We are done.)
If no post is found in the database with that id, then fetch the post from the API.
Save the post in the database.
Return the post.

Putting this logic into code:

server.route({
    url: "/cached-post/:id",
    method: "GET",
    preParsing: (request, response, payload, done) => {
      request.body = {
        ...request.body,
        operation: "sql",
        sql: `SELECT * FROM caching_project.single_post WHERE id = '${request.params.id}'`,
      };
      done();
    },
    handler: async (request) => {
      const cachedPost = await hdbCore.requestWithoutAuthentication(request);
      if (cachedPost.length === 1) {
        // Post found in db, so return it.
        return cachedPost[0];
      }


      // Post not cached/found in db, so fetch post from api
      const postId = request.params.id;
      const response = await fetch(
        `https://jsonplaceholder.typicode.com/posts/${postId}`
      );
      const newPost = await response.json();


      // Add the HarperDB operation to insert the post into the db
      request.body = {
        operation: "insert",
        schema: "caching_project",
        table: "single_post",
        records: [newPost],
      };
      // Cache (save) the result in db
      hdbCore.requestWithoutAuthentication(request);
      // Return the post
      return newPost;
    },
});

Above, the preParsing callback adds the HaperDB operation that we want to perform to the request body. This operation will fetch the post by id from the database.

We then use hdbCore.requestWithoutAuthentication(request) to perform the database operation without authorisation (these routes should be publicly available, so no authentication is needed). If the post is found, then we return it; otherwise, we fetch it from the api, save it to our HarperDB database then return it.

Now when we hit our endpoint for the first time, we have to wait 118ms to get the response:

But then when we hit the same route again (with the same post id), it takes just 10ms:

Why is the first request slower? Because when a user requests a post for the first time, that post isn’t cached (stored in our db), so the post has to be fetched from the api, cached into our db, then returned to the user.

But any subsequent requests for the same post will be faster, as the post is now cached in the database and can be fetched and returned straight away.

So, by caching with HarperDB, we have reduced the latency of the request by 4 to 5 times (from ~45ms uncached to ~10ms cached).

You can check what data is stored from HarperDB studio:

How could we improve our caching strategy?

Currently, our cache has no expiry date, meaning that if a post got updated at the JSONPlaceholder API, then we’d never see the update as our cache is permanent. To allow for post updates, we could add an expiry date column to each of our rows of cache to expire the cache after a few hours. Then, if the cache has expired, we can re-fetch the post from JSONPlaceholder and update the cache in the database, ensuring our cache is never more than a few hours out of date.

Summary

HarperDB is a great option for caching data as it brings the application and data closer to the user (also known as edge caching), reducing latency and speeding up your application.

If you enjoyed this article, give me a sub on YouTube or follow me on Twitter.

Thanks for reading!