DEV Community

Alexey Yakovlev
Alexey Yakovlev

Posted on

How I improved a Node web application perfomance multifold

This post is a continuation of Nest.js + Next.js series and it fixes a big issue I left in the code in the second article. While I will be using the project from those articles this article is not specific neither to Nest.js nor to Next.js. The goal of this article is to tell a story how I managed to improve a Node application perfomance multifold and share my experience on how to profile complex perfomance issues and make sure they do not return.

Table of Contents

Introduction

I was assigned to refactor bits of server-side rendering pipelines in a project using Nest.js + Next.js combo. Bridge implementation is very similar to the one I left in the articles mentioned earlier. I was still discovering all the quirks with this technology and it was the time I discovered that undefined values can not be serialized by Next.js. To fix this issue I introduced a sanitizing function. Inspired by JavaScripts functional code style and immutability the function was recursive and did not modify any values, only copied where needed.

This solution worked fine when the project was small. But the application was growing and we started transferring rather large objects with a lot of properties and deep nesting. Later we discovered a huge increase in FCP times in our user monitorings.

As the person with the largest amount of experience with nest-next in the team I went on a quest to investigate and find the source of issues. At this point I obviously forgot about the potential ineffectiveness of a function I wrote earlier. I also did not pay enough attention to the amount of data we started transferring.

I will show how I approached and overcome this challenge and what lessons I learned.

We will start by emulating a deployed environment. Developers strive to make different environments including the local one as similar as possible. Unfortunately often it is not possible. To make sure that I make proper inspections I had to analyze a deployed environment and not a local one. And so we will do in this article. Kind of.

Emulating environment

To emulate a remote server with SSH connection we will develop a Dockerfile with connection instructions. Then we will synthesize load in the application from Nest.js + Next.js series. With the load in place we may start inspecting our application to narrow down what causes the issue. Use this revision to work along or check complete code on my GitHub - https://github.com/yakovlev-alexey/nest-next-example.

Containerizing application

In order to run application in an isolated environment I will be using Docker. We will need to run both Node.js app and SSH server in the background. The base image would be node:12.22-alpine. Alpine Linux images have certain prerequisites in order to properly boot background processes. About those prerequisites read a recent post I made on how to setup SSH server on an Alpine container.

FROM node:12.22-alpine

ARG ssh_pub_key

RUN mkdir -p /root/.ssh \
    && chmod 0700 /root/.ssh \
    && passwd -u root \
    && echo "$ssh_pub_key" > /root/.ssh/authorized_keys \
    && apk add openrc openssh \
    && ssh-keygen -A \
    && sed -i 's/AllowTcpForwarding no/AllowTcpForwarding yes/g' /etc/ssh/sshd_config \
    && echo -e "PasswordAuthentication no" >> /etc/ssh/sshd_config \
    && mkdir -p /run/openrc \
    && touch /run/openrc/softlevel

WORKDIR /app

COPY . .

RUN yarn && yarn build

ENTRYPOINT ["sh", "-c", "rc-status; rc-service sshd start; yarn start"]
Enter fullscreen mode Exit fullscreen mode

Something that was not present in the post I mentioned is the line configuring AllowTcpForwarding. This option allows us to transfer any data by binding ports using -L client option. It is required for Node.js inspector to communicate with a remote server. Otherwise this Dockerfile is the same and I will neither go into details of its workings nor develop a proper caching solution at this time.

Add shortcuts to frequently used commands to scripts section in package.json. We will map port 3000 as is and port 7655 to internal container port 22 - the default one for SSH server. For authorization a local key will be used and injected into the container.

"start:docker": "docker run --rm -p 3000:3000 -p 7655:22 -it $(docker build --build-arg ssh_pub_key=\"$(cat ~/.ssh/id_rsa.pub)\" -q .)"
# unwrapped command
docker run --rm \
    -p 3000:3000 -p 7655:22 \
    -it $(docker build \
        --build-arg ssh_pub_key="$(cat ~/.ssh/id_rsa.pub)" \
        -q . \
     )
Enter fullscreen mode Exit fullscreen mode

To establish a connection we will use the following script.

"start:ssh": "ssh -L 9221:localhost:9229 root@localhost -p 7655"
Enter fullscreen mode Exit fullscreen mode

Unfortunately each new built image will have a new host key since they are generated during the build. Just rm ~/.ssh/known_hosts locally or remove the host key from this file. StrictHostKeyChecking=no option will not help since it still prohibits port binding.

Generating load

Before we start adding stub data to generate load let's examine our current perfomance. Use Chrome Dev Tools embedded Lighthouse for this. Start the app using start:docker (it may take some time to build the image at first) and visit localhost:3000. After conducting both mobile and desktop tests I got the following results for mobile.

Mobile perfomance

And for desktop.

Desktop estimates

Seems very much fine at the moment.

Now let's imagine our application puts some large amounts of data into the client payload in order to display the page. It could be anything from actual data to simple translations. For this example I will be stubbing a large object of translations - as close to a real life scenario as possible.

// ./src/server/translations.middleware.ts
import { NextFunction, Request, Response } from 'express';

const createStubTranslation = () => {
  return {
    en: {
      value: 'Lorem ipsum',
    },
    gb: {
      value: 'Lorem ipsum',
    },
    meta: {
      description: 'Dolore sit ame',
      properties: {
        name: 'string',
        age: 'number',
      },
    },
  };
};

const TRANSLATIONS = Array.from(Array(5000)).reduce((acc, _item, index) => {
  acc[index] = createStubTranslation();
  return acc;
}, {});

export const translationsMiddleware = (
  req: Request,
  _res: Response,
  next: NextFunction,
) => {
  req.translations = TRANSLATIONS;

  next();
};
Enter fullscreen mode Exit fullscreen mode

This basic middleware will inject a translations property with a large amount of nested properties. To make sure it properly works extend the type for Express Request.

// ./@types/express.d.ts
import 'express';

declare module 'express' {
  interface Request {
    translations: any;
  }
}
Enter fullscreen mode Exit fullscreen mode

And add this middleware to AppModule.

import { MiddlewareConsumer, Module } from '@nestjs/common';
import { translationsMiddleware } from './translations.middleware';

@Module({})
export class AppModule {
  public configure(consumer: MiddlewareConsumer) {
    consumer.apply(translationsMiddleware).forRoutes('*');
  }
  // ...
}
Enter fullscreen mode Exit fullscreen mode

Finally update extractAppData to filter translations property.

// ./src/client/ssr/extractAppData.ts
import { Request } from 'express';
import { GetServerSidePropsContext } from 'src/shared/types/next';
import { AppData } from 'src/shared/types/app-data';
import { filterUnserializable } from './filterUnserializable';
import { StaticQuery } from './buildServerSideProps';

const extractAppData = (
  ctx: GetServerSidePropsContext<Partial<StaticQuery>>,
) => {
  const { features, basePath } = ctx.query.config || {};

  const { translations } = ctx.req as Request;

  return filterUnserializable({
    features,
    basePath,
    translations,
  }) as Partial<AppData>;
};

export { extractAppData };
Enter fullscreen mode Exit fullscreen mode

Narrowing down the issue

The load is in place - time to check Lighthouse. Reboot start:docker and generate another report.

Results with load

Seems fine, right? But it does not align well with what you actually see when you refresh the page - loading takes well over a second - and that is on a desktop (assuming you are on one) for an almost empty page.

Well, if you scroll Lighthouse report a little lower you will see a few "opportunities". It turns out it indeed takes more than a second to recieve the first byte from the server. However it is not exactly clear from the Lighthouse score. It can be very deceiving and that is one of the reasons why you should not consider a primary metric for perfomance.

Lighthouse opportunities and recommendations

Another tool we may employ to narrow down a bottleneck is request timings. In the network section of Dev Tools select any request and open "Timing" tab. For the initial request you will see a similar picture. Large TTFB always indicates issues with server-side request handling.

Initial request timings

You may narrow down issue location by checking other requests which do not require server rendering a page. They will have little to no TTFB. Clearly something is wrong with server-side rendering.

When a server handles the request it typically first executes all the middlewares, then interceptors, actual handlers and filters if needed. Server-side rendering may be stalled by blocking requests run synchronously one after another. In large projects it is really challenging to pinpoint what causes issues analytically.

Therefore we should do a perfomance inspection using profiling. But the code is run on a remote server, how can we debug and profile a remote Node.js application?

Profiling a remote Node.js application

I will be largely following the official doc on Node.js debugging.

Attaching Node.js inspector

First thing we should do is establish a remote SSH connection to our server. We already configured SSH in the Docker container and may use start:ssh script to connect.

If you are having issues connecting try troubleshooting recommendations from my recent post on Alpine Docker images SSH I mentioned earlier.

Upon successful connection you should find the PID for the Node.js application. Use ps command for this.

$ yarn start:ssh
# <...>
$ ps
PID   USER     TIME  COMMAND
    1 root      0:01 node /opt/yarn-v1.22.17/bin/yarn.js start
  127 root      0:00 sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
  154 root      0:18 /usr/local/bin/node dist/server/main.js
  176 root      0:00 sshd: root@pts/1
  178 root      0:00 -ash
  181 root      0:00 ps
Enter fullscreen mode Exit fullscreen mode

You would see a similar result. What we need is main.js process - PID 154. Currently there is not inspector attached to it. To open a Node.js process to inspector attachments send a SIGUSR1 signal to it using kill -s SIGUSR1 154.

Now that Node.js process is ready to be inspected you should open dedicated Dev Tools for Node. Open chrome://inspect in your Chromium based browser. You would see a page like this.

Chrome devices inspection

Click "Open dedicated Dev Tools for Node". Here you should add a new remote connection. Remember SSH connection script? -L 9221:localhost:9229 binds internal Docker container port 9229 (the default one used after sending SIGUSR1 signal) to localhost:9221. Therefore new connection network address would localhost:9221.

Detached dedicated Dev Tools for Node

If you have done everything correctly you should see a new remote target on chrome://inspect.

New remote target

At the same time Dev Tools should now show some messages in the console and server file system in "Sources". There you may actually add breakpoints and debug your code remotely. But what we are interested in at the moment is "Profiling" tab.

Reading profiles

Profiling tab

Click "Start" and refresh localhost:3000 in your browser. After successful refresh and complete page load return to Dev Tools and stop the recording. You should now see "Profile 1" in the list on the left.

Profling results

At the top we would see functions that took the longest to execute. That would be _defineProperty and _objectSpread generated by tsc for new language features to work with ES5. Still we may be using those in a variety of places. Let's sort not by "Self time" (time that took to execute statements in the function itself and not the internal function calls) but by "Total time". And we indeed see a clear winner.

filterUnserializable takes a long time to finish

filterUnserializable takes an unreasonable amount of time to finish. _defineProperty and _objectSpread functions already tell us that it is due to inefficient spreads and property assigning. Clearly we should refactor this function to make it faster.

Benching improvements

In order to be sure that we introduce a quicker solution we should employ a tool like jsbench.me. First we add tested data at "Setup JS" stage. Then add test cases with different function implementations. Check this bench with suggested implementation and run it yourself for results.

My bench results

Clearly the new function is much better. With smaller payloads old implementation will show slightly better results but still worse than the new mutable one.

When refactoring such an important function you surely would want to first add unit tests to make sure that it has the same behaviour. At this time we will not do that but make sure you do so when refactoring production applications. The new implementation also fixes a bug with invalid array serialization.

Let's implement the improvement.

// ./src/client/ssr/filterUnserializable.ts
const filterUnserializable = (
  obj: Record<string, unknown>,
  filteredValues: unknown[] = [undefined],
) => {
  Object.keys(obj).forEach((key) => {
    if (filteredValues.includes(obj[key])) {
      delete obj[key];
      return;
    }

    if (obj[key] === Object(obj[key]) && !Array.isArray(obj[key])) {
      filterUnserializable(obj[key] as Record<string, unknown>, filteredValues);
    }
  });

  return obj;
};

export { filterUnserializable };
Enter fullscreen mode Exit fullscreen mode

Now the application is using a much more effective way of filtering unserializable data and can grow much larger without perfomance hickups. Let's make sure that improvements indeed took place, restart Docker container and record another profile.

When rebooting container and attempting to connect via SSH you will likely see error messages like channel 3: open failed: connect failed: Connection refused. Node.js Dev Tools make attempts to connect though there is no servers listening on internal Docker container port 9229. Run kill -s SIGUSR1 <APP_PID> again to allow inspector attachments. Or temporarily close/delete connection in Dev Tools.

Improved implementation profile

Times of 100ms are much more reasonable for large objects like this. Obviously you could make more improvements like making sure that objects like translations are valid before they reach SSR or caching validated objects. You may also assume that there are no nested undefined properties. For the purpose of this article existing improvements will suffice.

Lessons learned

Now that we profiled and fixed the issue in an emulated environment we can compile a list of things a developer or a team should do to avoid situations when you need to conduct profiling on your application.


First I will list preemptive solutions and then how you may improve perfomance when issues arise.

1) Collect real usage data using Real user monitoring. Some services like Vercel provide RUM out of the box. Collect data like FCP, TTI and other metrics.

2) Setup alerts over spikes in RUM data. This way when a release happens to introduce a significant slowdown you will be alerted immediately. If you are against alerts or have technical difficulties implementing them develop a culture of frequent monitoring for this data. Regularly assess it a few days after a release happens.

3) Perform regular load testing. It will also succeed in showing drops in responsiveness of a service under load.

4) Do not overuse immutability. While it does help you write predictable code it also hurts perfomance when overused. When in doubt create a quick bench on jsbench to make sure you make the right decision.

5) Parallelize requests and stream data where possible. If a page requires multiple pieces of information attempt requesting them simultaneously in parallel.

6) Prevalidate data and use as little of it as possible. Do not pull unnecessary entities. Split data required for individual pages. Use cache where applicable.

7) For critical parts that are inaccessible by RUM setup manual monitoring by tracking time to execute with tools like console.time() or process.hrtime.


And when challenged with improving perfomance do:

1) Before making any improvements create a baseline for perfomance. This may be some metric or a combination of those. It should work toward improving actual user experience.

2) Do not rely on assumptions. Conduct perfomance tests and record profiles. Use real data on an environment as close to the production one as possible. Use this data to choose next steps.

3) When introducing an improvement make sure that it behaves the same way using unit tests. And make sure it does what you want it to do - perform better. Once again employ jsbench.


I hope you found this article helpful in learning remote Node.js profiling and general perfomance care. Perhaps you have some rules of your own you follow to make sure your app stays perfomant? Or a story of a big perfomance hickup? Feel free to discuss in the comments.

Discussion (0)