In a recent project, I was tasked with creating an endpoint that would parse and process files generated by the firmware of a device from a trained neural network.
The device would send a JSON file through a REST API inside a Node.js backend using a multi-part algorithm. Up until that point, the task seemed straightforward.
But as I began working on the logic, I realized it would be tricky to assume that the function responsible for processing the input would always receive the expected input.
That's when I started exploring the potential of TypeScript type guards, which would help me ensure the input's expected shape and avoid runtime errors.
Limitations of Static Type Checking
TypeScript enables you to perform static type-checking validations in compile time. This works very well as long as we are dealing with functions that accept inputs whose values have resulted from evaluating expressions within the application's scope.
Now, what happens when we receive certain values that come from beyond the border of our system? Think of it as the result of database queries, requests to external APIs, and Bluetooth communications with other devices to name a few. In these scenarios, we should never assume anything about what the incoming data will be like because if we do, we would be generating a bias that could lead to unexpected behavior within the system.
So what do we do in these situations? Should we leave it to faith and watch it fail if the input type is not the one we expected? Thankfully, TypeScript offers us very powerful tools to perform type validation at runtime for these scenarios.
Any type behavior and potential issues
Taking into account what I stated above, let's suppose I don't want to assume anything from the parsing function's input. How should I type it then? One could easily be inclined to want to use an any
type.
Is there a problem with it? Pay attention to what happens when we want to access a property of an any
type. First, let's take a look at this:
class Dog {
name: string;
constructor(name: string) {
this.name = name;
}
bark() {
console.log("Woof woof");
}
}
class Fish {
name: string;
constructor(name: string) {
this.name = name;
}
swim() {
console.log("Splash splash");
}
}
const scooby = new Dog("Scooby Doo");
const nemo = new Fish("Nemo");
scooby.bark();
nemo.swim();
nemo.bark(); // ❌ Property 'bark' does not exist on type 'Fish'.
If we try to call the bark fn from a Fish
instance we will clearly get an exception since the class Fish
does not bark
.
Now let's see what happens by slightly changing the variables declaration.
const nemo: any = new Fish("Nemo");
nemo.swim();
nemo.bark(); // ✅ None compile errors.
// On execution:
// ❌ [ERR]: "Executed JavaScript Failed:"
// ❌ [ERR]: nemo.bark is not a function
Since the beginning, we knew this was destined to happen. But why is this so? Just as we inferred it, TypeScript's compiler should have been able to do it too.
Let's go to the definition of the any
type:
TypeScript also has a special type, any
, that you can use whenever you don’t want a particular value to cause type-checking errors.
When a value is of type any
, you can access any properties of it (which will in turn be of type any
), call it like a function, assign it to (or from) a value of any type, or pretty much anything else that’s syntactically legal.
As we can notice, the any
type is intended to be used when you desire to avoid static type checking on variables. With this in mind, we should avoid using this type in almost all scenarios.
So if the any
type doesn't work for typing something we have no information about, is there another type that allows us to do this?
The unkonwn
type
In Typesscript 3.0 the unkonwn
type was introduced. You can think of it as the type-safe counterpart of the any
type. How? Although we use both of them to type those values we know nothing about, the main difference is that unknown
is much more restrictive in the sense that any property you want to access or almost any kind of operator you want to use it with, will result in a compiler error.
Taking the previous case as a reference:
const nemo: unknown = new Fish("Nemo");
nemo.swim(); // ❌ 'nemo' is of type 'unknown'.
nemo.bark(); // ❌ 'nemo' is of type 'unknown'.
The obvious thing would be to ask ourselves: if something of the unknown
type is not allowed to use operators of any kind or access its properties, what’s the point of using it?
This gives us the ground to introduce the type guards
.
Introduction to type guards
People often confuse a dynamically typed language with the absence of types.
Are there types in Javascript? Of course there are, if you have any doubts, evacuate them right now by opening the console of your browser and executing a simple sentence:
var x = 5;
typeof x; // 'number'
If Javascript provides operators to perform type validations, it would be interesting if it could make some assumptions regarding the type of a variable within certain flows. Type guards along the unknown
type can be used for these scenarios.
Let's see it in action:
// Without using type guards
function isOdd(x: unknown) {
return x % 2 === 0; // ❌ 'x' is of type 'unknown'
}
// Using type guards
function isOdd(x: unknown) {
if (typeof x !== "number") return false;
return x % 2 === 0; // ✅ None compile errors.
}
We just found a way of restricting the type of a given variable. In TypeScript we can identify five different ways to use type guards:
- The
instanceof
keyword - The
typeof
keyword - The
in
keyword - Equality narrowing type guard
- Custom type guard with predicate
I won't be analyzing the first 4, but feel free to access the documentation, where they are explained in detail. Let's emphasize the ‘Custom’ ones with a predicate.
Custom type guards
Although there will be scenarios in which we cannot make any inference in the lexical and syntactic analysis, a series of additional guidelines may allow us to unequivocally distinguish if something is of a particular type.
A clear example would be the enum values. Suppose we had an enumeration to define the different sizes of a popcorn package and we wanted to evaluate if for a given value, for example coming from an external API call, we would need to evaluate if it is indeed a Popcorn's enum value.
How do we do it? TypeScript enables us to turn any predicate function into a custom type guard through the use of the is
operator.
export enum PopCornSize {
Small = "Small",
Medium = "Medium",
Large = "Large",
}
const isPopCornSize = (size: unknown): size is PopCornSize =>
Object.values(PopCornSize).includes(size as PopCornSize);
This is more than clear that to prove something is a member of an enumeration. You just have to check if its value matches any of the values of the enum.****
Wrapping it all together through an example
Let's summarize everything we've cover up until now through a simplified, real-life case:
We have a machine for medical use that monitors a patient's heart pressure and how much they are moving.
The heart rate is measured in BPS while the movement is measured through various types of sensors, generating a discrete value that can be either 0, 1, or 2 which correspond to "LOW", "MEDIUM" and "HIGH" respectively.
This machine records these metrics periodically and generates a JSON with the report of the traces on-demand. Each one, along with its value, is recorded as a timestamp in epoch with the time at which it was recorded.
The format is as described below:
{
"log_uuid": 141,
"version": 0,
"user_metrics": [
{ "ts": 1673550600, "evt": 1, "val": 63 },
{ "ts": 1673550600, "evt": 2, "val": 1 },
{ "ts": 1673550630, "evt": 1, "val": 71 },
{ "ts": 1673550630, "evt": 2, "val": 0 },
{ "ts": 1673550660, "evt": 1, "val": 69 },
{ "ts": 1673550660, "evt": 2, "val": 2 },
{ "ts": 1673550690, "evt": 1, "val": 66 }
]
}
While the expected data format may be documented, there is still a risk of receiving unexpected data or experiencing malfunctions if the firmware team makes changes without proper notification.
How can we confirm that the file is indeed a JSON file? To ensure its integrity, let's begin by performing a validation check.
export const parseSleepSession = (file: Express.Multer.File): SleepSession => {
const fileStr = file.buffer.toString();
if (typeof fileStr !== "string") {
throw new ForbiddenError("INVALID_JSON_FILE", { file: file.buffer });
}
const rawJson = JSON.parse(fileStr) as unknown;
if (
rawJson === null ||
typeof rawJson !== "object" ||
Array.isArray(rawJson)
) {
throw new ForbiddenError("INVALID_JSON_FILE", { rawJson });
}
};
It's important to note that if anything in the JSON file doesn't match our expected format, the program will immediately return an exception with an error code and log it along with the file. While the logging process itself may not be noticeable, it's crucial for debugging purposes. Personally, I like to configure a middleware that captures exceptions and, depending on certain circumstances, reports them to a platform such as Sentry.
Now that we know that the file is JSON, let's continue by validating the first two fields. Given the concepts we covered earlier this should be trivial by now.
const id = (rawJson as RawSleepMetrics).log_uuid as unknown;
const version = (rawJson as RawSleepMetrics).version as unknown;
const userMetricsRaw = (rawJson as RawSleepMetrics).user_metrics as unknown;
if (typeof id !== "number") {
throw new ForbiddenError("INVALID_ID", { id, rawJson });
}
if (typeof version !== "number") {
throw new ForbiddenError("INVALID_VERSION", { rawJson, version });
}
The typeof
operator is a type guard, so it will be enough to validate that they are numeric fields. Keep in mind that we will limit ourselves to validating the correctness of the document format. The fact that the identifier might already exist for a previously parsed file or any other type of validation that refers to the business logic will escape the scope of these validations.
Let's continue with the user_metrics
array. Let's validate that the field exists and that it really is an array.
const userMetricsRaw = (rawJson as RawSleepMetrics).user_metrics as unknown;
if (isNil(userMetricsRaw)) {
throw new ForbiddenError("USERMETRICS_MISSING", { rawJson });
}
if (!Array.isArray(userMetricsRaw)) {
throw new ForbiddenError("USERMETRICS_NOT_AN_ARRAY", {
rawJson,
userMetricsRaw,
});
}
The statements in the block above translates to the following:
- A property exists with that key.
- Its value is not null.
- The value is an array.
Simple, right? Let's continue with the content of user_metrics
.
const activityScore: ActivityScoreRegistry[] = [];
const heartRateMonitor: HeartRateMonitorRegistry[] = [];
userMetricsRaw.forEach((currMetric: unknown) => {
if (
currMetric === null ||
typeof currMetric !== "object" ||
Array.isArray(currMetric)
) {
throw new ForbiddenError("USER_METRIC_INVALID", { currMetric, rawJson });
}
const timestampEpoch = (currMetric as RawMetricRecord).ts as unknown;
const metric = (currMetric as RawMetricRecord).evt as unknown;
const value = (currMetric as RawMetricRecord).val as unknown;
});
Let's start by validating that the timestamp is indeed a date. This can be tricky for a number of reasons. First, by epoch we usually refer to the count of seconds elapsed from January 1, 1970 to date, although some languages define it as the count of milliseconds (between them JS).
On the other hand, the JS Date class allows you to pass an epoch to the constructor as a way to initialize an instance. The cheat of JS that comes from the fact that it was created under the paradigm that nothing should be broken on the web is that if we were to happen to it something that was not a valid epoch, far from returning an exception, what would happen is that it would return a value, numeric NaN
. With which the validation will look a bit peculiar, but we will come to something like this:
if (typeof timestampEpoch !== "number") {
throw new ForbiddenError("INVALID_TIMESTAMP", { rawJson, timestampEpoch });
}
if (!isEpochInSeconds(timestampEpoch)) {
throw new ForbiddenError("INVALID_TIMESTAMP_NOT_IN_EPOCH_SECONDS", {
rawJson,
timestampEpoch,
});
}
const timestamp = new Date(timestampEpoch * 1000);
if (isNaN(timestamp.getTime()))
return new ForbiddenError("INVALID_TIMESTAMP", { rawJson, timestamp });
Moving on, in the JSON we can see that a field is sent indicating what type of metric it is. If you ask me, it isn't the best choice for the JSON structure. Personally, I would have sent two separate lists with the values of ‘movement’ and ‘heart rate’, but hey, it is what it is.
There are two types of metrics, so we could clearly interpret this as a list. Now, how do we validate an enumeration? We will have to resort to creating a custom-type guard.
enum Metric {
ActivityScore = 2,
HeartRateMonitor = 1,
}
const isMetric = (m: unknown): m is Metric =>
Object.values(Metric).includes(m as Metric);
if (!isMetric(metric)) {
throw new ForbiddenError("INVALID_METRIC_ENUM", { metric, rawJson });
}
if (typeof value !== "number") {
throw new ForbiddenError("INVALID_METRIC_VALUE", { rawJson, value });
}
Now that we have parsed the enumeration, the rest should be as easy as checking what type of metric it is and performing validations, which should be simple at this point.
if (metric === Metric.ActivityScore) {
if (value < ACTIVITY_SCORE_LOWER_LEVEL) {
throw new ForbiddenError("INVALID_ACTIVITY_SCORE_BELOW_RANGE", {
rawJson,
value,
});
}
if (ACTIVITY_SCORE_UPPER_LEVEL < value) {
throw new ForbiddenError("INVALID_ACTIVITY_SCORE_ABOVE_RANGE", {
rawJson,
value,
});
}
activityScore.push(entry);
}
if (metric === Metric.HeartRateMonitor) {
if (value < HEART_RATE_LOWER_LEVEL) {
throw new ForbiddenError("INVALID_HEART_RATE_BELOW_RANGE", {
rawJson,
value,
});
}
if (HEART_RATE_UPPER_LEVEL < value) {
throw new ForbiddenError("INVALID_HEART_RATE_ABOVE_RANGE", {
rawJson,
value,
});
}
heartRateMonitor.push(entry);
}
With this, our parser is covered. Now we can confirm that all necessary validations are being carried out to guarantee that the JSON is coming in the expected format. Note that we should perform this type of validation whenever we interact with systems that are beyond our reach (be it a database, API, etc.).
It is interesting to appreciate how type guards, among other things, allow us to dynamically perform type validations and the ease that the language provides us to use them.
Conclusion
Sometimes, in our eagerness to have a high level of predictability, we may make the mistake of wanting to type things like requests to an API or queries to a database.
The mistake of falling into these types of scenarios lies in not understanding that those entities coming from elements declared outside of the scope require to be treated in very different ways from those within.
When conceiving what type of structures the information coming from outside will have, confidence will not be enough in the sense that we will never have any control over those domains.
As we covered in this post, the most idiomatic approach is to avoid making assumptions and instead perform the necessary validations at the boundary. This enables us to take necessary actions in catastrophic scenarios where the received data does not comply with the expected format. By doing so, we can trigger alerts at an early stage and avoid potential errors that may be difficult to detect.
Top comments (0)