DEV Community

Darryl Daniel
Darryl Daniel

Posted on

How to health check multiple PM2 processes with NGINX

Production node apps require multiple instances for reliability. One possible configuration that achieves this uses PM2 to manage the app instances and lifecycle and NGINX for the load balancing. When running in this configuration, however, it can become tricky to check the health status of each instance individually.

Example '/health' request flow

As you can see in the above diagram, when one of the PM2 instances has shut down, NGINX's load balancer will see that the first instance is dead and route the request to the instance that is up and running. You won't be able to check on the status of the dead instance until it is restarted and NGINX will continue to tell you that everything is fine despite one of the app instances being down.

In this post, I'd like to demonstrate a method of configuring NGINX such that each instance can be checked independently and the above scenario avoided.

Start with a basic node server:

const Koa = require('koa');
const Router = require('@koa/router');
const app = new Koa();
const router = new Router();

// PM2 will set the NODE_APP_INSTANCE when it spins up an instance
const portOffset = parseInt(process.env.NODE_APP_INSTANCE);

router.get("/health", async ctx => {
    const healthServerNumber = portOffset + 1;
    ctx.body = "health status server " + healthServerNumber + " -> ok";
    ctx.status = 200;
});

router.get("/", async ctx => {
    ctx.body = "default route";
    ctx.status = 200;
});

app.use(router.routes());
app.listen(9001 + portOffset);

The process.env.NODE_APP_INSTANCE variable is set by PM2. Each app instance will be assigned it's own port based on the value set by this variable. This will make each app instance accessible via it's own port. The PM2 config looks like this:

const app = {
    "name": "health-checks-app",
    "script": "./index.js",
    "max_memory_restart": "100M",
    "watch": true,
    "log_date_format": "YYYY-MM-DD HH:mm:ss.SSS",
    "env": {
        "NODE_ENV": "production"
    },
    "exec_mode": "cluster",
    "instances": "2",
    "kill_timeout": 5000,
};

module.exports = {
    "apps": [app]
};

Add the following to the NGINX config on the server:

upstream health_check_servers {
    server localhost:9001;
    server localhost:9002;
}

upstream health_check_server_1 {
    server localhost:9001;
}

upstream health_check_server_2 {
    server localhost:9002;
}

server {
    listen 80;
    server_name localhost;

    location ~ ^/health/instance-(.+)$ {
        rewrite ^/health/instance-(.+)$ /health break;
        proxy_pass http://health_check_server_$1;
    }

    location / {
        proxy_pass http://health_check_servers;
    }
}

There are 3 separate upstream definitions. One for the pool where NGINX will pick a server for each request based on a round robin and one each for each instance. Each time a new instance is added, this will have to be updated.

So how does this all fit together? When a request is made to /health/instance-1, NGINX will route that request to the health_check_server_1 upstream which will make a request to the instance on port 9001. Likewise, when the url is /health/instance-2 the second instance will receive the request. The rewrite is important here otherwise the url that is requested on the app will be http://localhost:9001/health/instance-1 which does not exist. We rewrite this to /health to get the correct endpoint.

This configuration enables us check the status of each app separately. The benefit here compared to having some kind of monitoring directly on a server is that your application is tested end-to-end and you can be confident in the fact that it is accessible reliably from the outside world.

Example repo can be found here.

Top comments (0)