loading...
Cover image for Dealing With Unexpected Data in JavaScript

Dealing With Unexpected Data in JavaScript

khaosdoctor profile image Lucas Santos ・9 min read

One of the biggest problems with dynamically typed languages is that we cannot guarantee that the data flow will always be correct, since we cannot "force" that a parameter or variable, for example, is not null. The standard output we use when we have these cases is the simple test:

function foo (mustExist) {
  if (!mustExist) throw new Error('Parameter cannot be null')
  return ...
}

The problem with this is the pollution of our code, as we have to test variables everywhere, and there is no way to guarantee that everyone who is developing the code will, in fact, perform this test everywhere specially where a variable or parameter cannot be null, we often do not even know that such a parameter can come as undefined or null, this is very common when we have different backend and frontend teams, which is the vast majority of cases.

In order to improve this scenario a little, I started to search how we can minimize the "unexpected" effects in the best way and what would be the best strategies for this. That's when I came across this incredible article by Eric Elliott. The idea here is not to completely contradict his article, but to add some interesting information that I ended up discovering with time and experience in the area of JavaScript development.

Before I start, I wanted to brush up on a few points that are discussed in this article and give my personal opinion as a backend developer, as the focus of the other article is more on the frontend.

The Origin of All

The problem of data processing can have several sources. The main cause is, of course, user input. However, there are other sources of malformed data, in addition to those mentioned in the other article:

  • Database records
  • Functions that return null data implicitly
  • External APIs

We will have a different treatment for each type of case we take and we will go through all of them later, remembering that nothing is a silver bullet. Most of these origins come from human errors, because many times the languages are prepared to deal with null or undefined data, however the flow of transformation of this data may not be prepared to deal with them.

User Inputs

In this case, there's not much we can do, if the problem is user input, we have to deal with it through what we call Hydration (in other words, we have to take the raw input that the user sends to us, for example, in a payload of an api, and turn it into something that we can work on without errors.

In the backend, when we are using a webserver like Express, we can perform all the handling of user inputs coming from the frontend through standards such as JSON Schema or tools like Joi.

An example of what we can do using a route with Express and AJV would be the following:

const Ajv = require('ajv')
const Express = require('express')
const bodyParser = require('body-parser')

const app = Express()
const ajv = new Ajv()

app.use(bodyParser.json())

app.get('/foo', (req, res) => {
  const schema = {
    type: 'object',
    properties: {
      name: { type: 'string' },
      password: { type: 'string' },
      email: { type: 'string', format: 'email' }
    },
    additionalProperties: false
    required: ['name', 'password', 'email']
  }

  const valid = ajv.validate(schema, req.body)
    if (!valid) return res.status(422).json(ajv.errors)
    // ...
})

app.listen(3000)

See that we are validating the body of a route, by default the body is an object that we will receive from the body-parser package through a payload, in this case we are passing it through a JSON-Schema so it'll be validated, if one of these properties has a different type or a different format (in the case of email).

Important: Note that we are returning an HTTP 422 code, which means Unprocessable Entity. Many people treat a request error, such as a wrong body or query string, as a 400 Bad Request error, which is not entirely wrong, but the problem in this case was not with the request itself, but with the data that user sent with it. So the best answer we can give a user is 422, stating that the request is right, but it cannot be processed because its contents are not in the format we expect.

Another option besides AJV is the use of a library that I created together with Roz, which we called Expresso, a set of libraries to make the development of APIs that use Express a bit easier. One of these tools is the @expresso/validator which basically does what we showed earlier, but it can be passed as a middleware.

Optional Parameters With Default Values

In addition to what we previously validated, we opened the possibility that a null value could pass into our application if it is not sent in an optional field. For example, imagine that we have a paging route that takes two parameters: page and size as query strings. But they are not required and, if not received, must assume a default value.

Ideally, we should have a function in our controller that does something like this:

function searchSomething (filter, page = 1, size = 10) {
  // ...
}

Note: Just like the 422 we returned earlier, for paginated queries, it is important that we return the correct code, 206 Partial Content, whenever we have a request whose amount of data returned is only a part of a whole, we will return it as 206, when the last page is reached by the user and there is no more data, we can return 200 and, if the user tries to search for one page beyond the total range of pages, we return a 204 No Content.

This would solve the problem in case we receive the two blank values, but this is where we touch a very controversial point of JavaScript in general. The optional parameters only assume their default value if, and only if, it is empty, but this does not work for null, so if we do this:

function foo (a = 10) {
  console.log(a)
}

foo(undefined) // 10
foo(20) // 20
foo(null) // null

Therefore, we cannot rely only on optional parameters to treat information as null. So, for these cases we can do it in two ways:

  1. If statements on the controller
function searchSomething (filter, page = 1, size = 10) {
  if (!page) page = 1
  if (!size) size = 10
  // ...
}

Which is not very pretty, and it's verbose.

  1. Treating with JSON-Schema directly on the route

Again we can use AJV or @expresso/validator to validate this data for us:

app.get('/foo', (req, res) => {
  const schema = {
    type: 'object',
    properties: {
      page: { type: 'number', default: 1 },
      size: { type: 'number', default: 10 },
    },
    additionalProperties: false
  }

  const valid = ajv.validate(schema, req.params)
    if (!valid) return res.status(422).json(ajv.errors)
    // ...
})

Dealing with Null and Undefined

I, personally, am not a big fan of this dialectic that JavaScript uses to show that a value is blank using both null and undefined, for several reasons, in addition to being more complicated to abstract these concepts, we have the case of optional parameters. If you still have doubts about the concepts, a great practical explanation would be the following image:

Since we now know what each definition is about, a major addition to JavaScript in 2020 will be a set of two features. Null Coalescing Operator and Optional Chaining. I won't go into details because I already wrote an article about this – It's in portuguese –, but these two additions will make it easier a lot because we will be able to focus on the two concepts: null andundefined with a proper operator, the ??, instead of having to use Boolean negations like !obj, which are prone to several errors.

Implicitly null Functions

This is a much more complex problem to solve because it is just implicit. Some functions handle data assuming that it will always be filled, but in some cases this may not be true, let's take a classic example:

function foo (num) {
  return 23*num
}

If num isnull, the result of this function will be 0. What may not be expected. In these cases, we don't have much to do but test the code. We can perform two forms of testing, the first would be the simple if:

function foo (num) {
  if (!num) throw new Error('Error')
  return 23*num
}

The second way would be to use a Monad called Either, which was explained in the article I quoted, and is a great way to deal with ambiguous data, that is, data which can be null or not. That is because JavaScript already has a native function that supports two action streams, the Promise:

function exists (value) {
  return x != null ? Promise.resolve(value) : Promise.reject(`Invalid value: ${value}`)
}

async function foo (num) {
  return exists(num).then(v => 23 * v)
}

In this way we can delegate catch from exists to the function that called the foo function:

function init (n) {
  foo(n)
    .then(console.log)
    .catch(console.error)
}

init(12) // 276
init(null) // Invalid value: null

External APIs and Database Records

This is a very common case, especially when we have systems that were developed on top of previously created and populated databases. For example, a new product that uses the same database as a previous successful product, integrating users between different systems and so on.

The big problem here is not the fact that the database is unknown, in fact this is the cause, as we do not know what was done at the database level, we have no way of certifying whether the data will or will not come as null or undefined. Another case is that of poor documentation, where the database is not satisfactorily documented and we end up with the same problem as before.

There is not much to do in this case, I personally prefer to test if the data is in a way that I will not be able to use. However, it is not a good thing to do with the whole data, since many objects returned can simply be too big. So it is always a good practice to check if the data under which you are performing any function, for example, a map or filter is undefined or not before performing the operation.

Throwing Errors

It is a good practice to have what we call Assertion Functions for databases and also for external APIs, basically these functions return the data, if it exists, or else an error occurs when the data does not exist. The most common case of this type of functions is when we have an API which, for example, search for some type of data by an ID, the famous findById:

async function findById (id) {
  if (!id) throw new InvalidIDError(id)

  const result = await entityRepository.findById(id)
  if (!result) throw new EntityNotFoundError(id)
  return result
}

Replace Entity with the name of your entity, for example, UserNotFoundError.

This is good because we can, within the same controller, have a function, for example, to find a user by ID, and another function that uses this user to search for another data, say, this user's profiles on another database collection. When we call the profile search function, we will make an assertion to guarantee that the user really exists in the our database, otherwise the function will not even be executed and we can search for the error directly on the route:

async function findUser (id) {
  if (!id) throw new InvalidIDError(id)

  const result = await userRepository.findById(id)
  if (!result) throw new UserNotFoundError(id)
  return result
}

async function findUserProfiles (userId) {
  const user = await findUser(userId)

  const profile = await profileRepository.findById(user.profileId)
  if (!profile) throw new ProfileNotFoundError(user.profileId)
  return profile
}

Note that we will not execute a call to the database if the user does not exist, because the first function guarantees its existence. Now on the route we can do something like:

app.get('/users/{id}/profiles', handler)

// --- //

async function handler (req, res) {
  try {
    const userId = req.params.id
    const profile = await userService.getProfile(userId)
    return res.status(200).json(profile)
  } catch (e) {
    if (e instanceof UserNotFoundError || e instanceof ProfileNotFoundError) return res.status(404).json(e.message)
    if (e instanceof InvalidIDError) return res.status(400).json(e.message)
  }
}

We can know what type of error to return only checking the name of the instance of the error class we have.

Conclusion

There are several ways that we can process our data so that we have a continuous and predictable flow of information. Do you know any other tips?! Leave it here in the comments :D

Enjoy this content!? Want to give a tip, opinion or just say hi? These are my main social networks:

Posted on Jan 30 by:

khaosdoctor profile

Lucas Santos

@khaosdoctor

Developer since 2011, working with high availability web and cloud native applications since 2013 and homebrewer in the free time. Microsoft MVP Reconnect and Google GDE. Loves communities <3

Discussion

markdown guide
 

Great article. I use packages such as class-validator, and class-transform to perform this kind of validation in the server. However, it's always nice to have JS regular alternatives just in case.

BTW, I would love to read an article about your recommendations of the HTTP codes usages. I've read something before, but I like your opinions about 422 and 206 (But I would always return 206 because it never will be all the content).

 

Awesome! This would be an awesome article to write! Thanks for the suggestion