When first getting started writing Node.js servers in ReasonML a developer might ask herself "should I raise exceptions when something goes wrong? Or should I encode my errors as data and use the
While initially developing the server I'm working on right now, I tried the "throwing exceptions" approach when anything goes wrong. I did this because often, in the context of a request execution, everything has to go just right in order to return a response. And I figured that if anything goes wrong, we would just return a 500, and that'd be fine.
But I ran into some challenges with this:
- Errors were hard to track to the source.
- Errors were hard to associate with a particular request.
- When the application logic did want to deal directly with a failed request, it felt difficult and awkward to do.
Based on my conversation with Jared Forsyth in this Reason Town episode:
https://reason.town/13-error-handing-and-let-anything (start listening at about 21:00)
Exceptions are for panics (something is terribly wrong, the whole server needs to crash). Results are for things that could fail. This is Rust's approach. You can throw an exception, but you can't catch an exception.
While thinking about this, I realized that I needed to break the problem space up a bit more. I was trying to solve two problems with one exception; knowing how to fix the app, and knowing what to tell the user. These problems are better handled on their own, however. And they can be expressed in the form of questions:
- How do I know what kind of a fix to make as a human engineer when something in the server goes unexpectedly wrong?
- How does the application know what to do when a task that could fail, does fail? And what should I do about it?
The answer to question 1 is: logs. When something unexpected has happened on the server that prevents the request from completing normally, and there's nothing the application can do currently to fix that thing, the application returns a 500 error to the user. And, importantly, a detailed log is made which leads the programmer to know exactly what failed and in what context.
This means that most logs in a server application should be connected to a request. The developer should be able to look at all the logs for the lifecycle of a request and pinpoint exactly what part of the process failed (more on this later).
Unexpected errors that can't be recovered from will likely be the most common case for error handling in the server (what? that doesn't sound right!). Yes, most operations in a request lifecycle need to succeed in order for the whole request to succeed. So in most cases, if something goes wrong, a detailed log should be left, and the server should return a 500.
Note, this doesn't mean your server should be returning a lot of 500s. Most of the things your server attempts to do should work most of the time.
But it does mean that in most of the places you're attempting something fail-able, your application has no recourse but to return a 500 if it does fail.
Example of when the application should leave a detailed log, and propagate a failure upward:
- User sends a login request.
- The server hashes the user's password, and tries to fetched the stored hashed password from the database for comparison.
- The database is unavailable because of a network partition.
- The application leaves a detailed log that the database could not be reached.
- The server returns a 500.
- The user sees a login error, stating that the server is having problems.
In the case above, there's nothing that the application can do all by itself to fix a network outage. So it leaves a log with enough information (what endpoint was called, what operation was attempted, and that the database could not be reached) and then tells the user that it couldn't complete the operation.
The answer to question 2 is to have a data type that encodes failure as data. In Reason,
Belt.Result.t('data, 'error) is there for this very purpose. If the potentially fail-able operation does fail, it can return some data to the caller about why the failure happened.
But then there's the challenge of knowing what to do with that new failure information.
The programmer should use information about why an operation failed when there's something that the application or the user can do about the failure.
Example of when the application or the user can do something about a failure:
- User sends a request to GET some information.
- The server forwards the user's credentials to the authentication server as part of the authentication step, expecting a user object in response.
- The auth server returns a failure, 403. The user's credentials were incorrect.
- Instead of logging an error and returning a 500, the application returns a 401 to the user, telling them that the email and password combination weren't found.
In the case above, the application knows that the user credentials were incorrect (401). Its logical path switched based on that information. Instead of continuing on to fetch user information, it returns early, informing the user that the credentials were incorrect. But it doesn't log an error message. The application maintainer doesn't need to know that the user had incorrect credentials, because there's nothing the application maintainer can do about that. (It's probably good to log the number of incorrect credential responses, however. If there's a sudden surge in 401 responses, it could by symptomatic of a bug in the application logic, or of data loss).
How does a programmer know if she is at a point in the code where logging would be helpful? Should she add a log when the database fails? or when the function in controller X that calls the database returns a failure? Or in the controller that calls controller X?
She should log as much information as is necessary to inform future-programmer what needs to be fixed if she finds those logs.
For example, finding this log might not be very helpful:
[ERROR] POST /users/login DecodeError: expected "id" but found 'undefined'
Looks like the decoding failed for some call to the database, but it isn't clear which call failed. In this case, she'd probably want to provide more information about the failure:
[ERROR] POST /users/login DecodeError: expected "id" but found 'undefined' Failed while trying to fetch the user's primary journal from the journal store
In this case, the programmer has logged both when the decoding error has happened, since she knows that standalone decoder errors are hard to track down, and she has made a log in the controller that tried to make the call. Now she knows which database call is getting the unexpected data, and that the problem lies in the decoder receiving an object without an 'id' attribute, though it was expected to have one.
This can be a complex topic, and deserves its own post. But in short, the programmer needs a unique ID for each request that can be added to each log resulting from that request.
In a blocking language that runs one request per thread, a request ID can be assigned to the thread's global memory, and read from any of the code in that thread. But Node.js is single-threaded, so we can't use that trick.
There are a few approaches solving this problem in Node.js:
- Continuation local storage This approach attempts to assign data to memory in a function that is added to the call stack right at the beginning of the request, and allow all code triggered from that function access to the same memory. It's a nice idea, but reportedly loses the context relatively easily when external libraries are used.
- Passing an object down to all functions In this approach, the programmer makes a logger instance that prefixes any logs with the request ID when the request starts, and passes down that logger instance to all functions that are called during the lifetime of the request. Any logs that are written during the request use the logger instance that was passed down. This approach is simple and straightforward, but messy and painful to execute.
- Passing logs up from all functions This is like the previous approach, except logs are collected during the execution of the request and passed upward. When the request finishes, a middleware writes out all the logs at once with their respective timestamps, and with the request ID. This approach can also be heavy-handed, but creative use of data types can clean it up a bit.
The way errors are logged or handled in a server application can make or break developer's the maintenance experience. An principled approach to error handling is required to allow developers to quickly find and fix server errors. Ideally this approach:
- Separates logging from failed operations
- Logging is for developers
- Failure data is for applications and users
- Groups all logs for a request together with a unique request ID, allowing the developer to see the logical progression of events in a single request
In Reason, the
Result data type is great for transforming failures into usable data when needed. Grouping logs together for one request isn't a terribly easy thing to do right now. It seems like the developer will have to choose relative unreliability, or more complex code.