This article was originally published at
This is the second article of a series focused on building a Microservice architecture with NodeJS. You can access the rest of the articles on the series below:
- Bunyan JSON Logs with Fluentd and Graylog
- Error Management in Node.js Applications (This article)
- Implementing Event Sourcing and CQRS pattern with MongoDB
- Canary Health Check Endpoints (Coming Soon)
- Writing MongoDB Database Migrations with Node.js (Coming Soon)
“Error Management” is a weird title. But I meant it. Error handling is a common topic which everyone talks about and writes about. Error handling in NodeJS? you’ll find loads of articles written about error handling in NodeJS. There’s a reason for that. Most of the developers who are new to the asynchronous programming in Node.js often get confused about how to handle different types of errors properly. try…catch doesn’t always come into rescue if you are in asynchronous programming. I also have listed some of the best practices of handling synchronous and asynchronous errors in Node.js in one of my previous articles about developing secure Node.js applications. And there comes a ‘but’.
Assuming you have handled your application errors correctly, and you successfully caught an error. The next most important part is what to do with the error you just caught. Just log it and swallow it as if nothing ever happened? should you escalate the error up? and where should it end up? If your application caught the error while processing an HTTP request sent by one of your API’s consumers, should you complain the error to the consumer? if so how? There are thousands of questions. In this article, I’m going to discuss some of the mistakes I’ve been doing and I have seen before coming up with a proper way to tackle most of these problems. Hence, the name “Error Management”
For the purpose of our guideline, let’s imagine our application to be a NodeJS based microservice that exposes a REST API and talks with one or more 3rd party services via the network. So, what do we actually need to achieve?
We need to handle our application’s errors properly, so that:
- The outcome of every possible error should be predictable
- The application can recover from critical errors without manual intervention.
- Errors while processing an HTTP request is conveyed to the client along with ‘minimum required, but descriptive information’ which will help the client to take an action based on that.
- The root cause of the error should be easily traceable and debuggable.
Here I’m listing 7 of the most common mistakes I have done and I have seen, and how I thought of fixing. However, there might be scenarios where you can’t strictly follow these solutions and rather follow different approaches. But in most cases, the solutions I’ve listed would be applicable. Please feel free to comment if you’d like to add something.
Error handling in asynchronous code is quite different and even tricky if you are not quite familiar with the different ways you can write asynchronous code. At the time of this writing, there are 3 ways you can handle asynchronous operations, and you have to use a slightly different approach to handle errors occurred in each of these situations:
- Using callbacks — Use error-first callback approach.
try-catchwon’t be helpful.
- Using promises and promise callbacks — Use
async-awaitwith to resolve promises (Or using ES6 generators with
yieldfor async workflow)
However, there’s a slightly confusing scenario when using
await. See the following two examples. These examples show a body of an async function written in two different ways. But the
catch block in Example 2 is useless because the promise returned by
myAsyncFunction() merely returned to the caller instead of waiting until it is resolved/rejected. Therefore, any promise rejections have to be handled in the caller’s scope.
Even if you have handled most of the potential error scenarios, it’s still possible that you might have missed a scenario that could lead to an uncaught exception or an unhandled promise rejection. However, it’s possible to identify such a scenario and handle it gracefully. This could be done by listening to the two events
unhandledRejection emitted by the
process object. However, doing this incorrectly could cause undesirable effects.
unhandledRejection are two scenarios where the application shouldn’t continue. If you are explicitly adding listeners to these two events, you need to make sure to:
- Log enough information about the error (possibly send them to your log management system or APM server) so that it can be investigated later.
- Force exit the application, so that your process manager/docker orchestrator to launch a replacement process.
Continuing to run the application without exiting after an
unhandledRejection could cause applications to either hang or behave unpredictably.
Another common mistake most of the developers do is, masking errors so that the callers below the call stack has no idea that an error has occurred. While this may make sense in certain situations, blindly doing will make it almost impossible to trace and diagnose errors that would otherwise lead to a major downtime of your application. Have a look at the below snippet which swallows the error
err and returns an empty array instead.
Only do this if you have already logged this error somewhere else and you are confident that the error shouldn’t be escalated to the caller of your current function (e.g, An HTTP server’s routing handler should not escalate error to the client). Otherwise, identify what type of error has been occurred and escalate it in a way that the callers below so that they can exactly know what went wrong. This brings us to the next point.
Converting generic error objects into specific error objects is important if your application needs to take different decisions based on the type of error. An example use case of implementing such specific errors is as follows:
Error object is very generic. To identify the specifics of the error, you need to inspect
error.stack properties. This is not a convenient way if you plan to scale your application. There are multiple specific errors thrown by the Node.js runtime such as
RangeError, etc. but they are not so reusable for other purposes.
This is where you need to define your own error types and throw the right error at the right time. This makes your application errors more self-explanatory and easily handlable. Let’s see an example.
Despite the verbose-look, I found this approach more robust and easy to handle. This way, you don’t need to ask your co-developers to stick to a conventional list of error codes and check for
error.code every time you catch an error to determine the next steps.
If the 3rd party service you consume is out of your control, you should be ready for all the possible scenarios that could go wrong.
See the following hypothetical program:
In this hypothetical example, we assume that the API we consume to fetch users returns an object in the success response. This object contains a property called
users which can be an array if there are users in the result, or
null if there are no users.
What if the developers of this API change the response object structure such that
undefined? Your application will still continue to run using the default value
 without throwing any clue of what’s happening. By the time you identify that something’s wrong, it might be hard to recover from.
Always try to be strict about the responses of 3rd parties. It’s always better for your applications to fail fast than continuing in an abnormal path. In that way, you can identify potential integration issues as early as possible, and prevent any data corruption or inconsistencies which are hard to recover from.
Choosing the best logging library for your application is not enough if you don’t use it properly. One of the most common features of all log libraries is that, you can log messages in different log levels and possibly sent these logs of each level to a different destination (e.g,
file etc.). To do this properly, you should pick the correct log level for your message based on how important the message is. The most common log levels are as follows:
log.debug— All messages which are not crucial, but could be important to debug something later.
log.info—All informative messages, which are crucial to identify a successful (or non-failure) action.
log.warn— All warnings which are not critical and doesn’t require immediate action, but important for investigating later.
log.error— All errors which require immediate attention, and could possibly lead to a disastrous scenario if ignored.
log.fatal— All errors which indicate a service outage, or a failure of a critical component which requires immediate action to recover from.
If you follow this convention strictly you can set up accurate alerts in order to identify critical problems immediately, while not having false alarms being triggered.
PS: Check out my post on setting up a log aggregation scheme with Bunyan logger, Fluentd and Graylog:
That’s it. These are just a few important keep-in-minds about “Error Management” in Node.js applications. Your opinions are always welcome. Feel free to put a comment. Thanks.
Background Image Courtesy: https://images.axios.com/WVWPMo4kVq7ZSwcIr16u8QZ8nAY=/0x280:5100x3149/1920x1080/2018/06/01/1527894970857.jpg