Evan Derby

Posted on Aug 26, 2019

Data Validation in Typescript Using the Either Pattern

#typescript #haskell #webdev #javascript

This summer, I worked on an internship project which involved creating a CRUD (Create, Read, Update, Destroy) application which handles Hackathons. During this project, my team and I discovered that we had no comprehensive solution or pattern for validating data coming into the application with the Create and Update actions.

In the end, our API methods would always consist of checking for the presence of a field, then checking some value based on that field, and so on. Instead of using the strong type checking abilities of Typescript, we resorted to frequent use of any and optional fields on our models. It was a mess (as an internship project might be).

interface Hackathon {
  name: string;
  endDate?: number; // UNIX epoch timestamp 
  startDate?: number; 
  ...
}

validateHackathon(hackathon: any) : void {
    if (hackathon['endDate'] && hackathon['startDate']) {
        if (hackathon['endDate'] < 0) {
            throw new Error("End date cannot be negative!");
        }
        if (hackathon['startDate']) < 0) {
            throw new Error("Start date cannot be negative!");
        }
        if (hackathon['startDate'] > hackathon['endDate']) {
            throw new Error("Start date must be before end date!");
        }
    }
    // ... various property checks and data validation steps ...
}

async updateHackathon(hackathon: any) : void {
    validateHackathon(hackathon);
    // If the program gets to this step, then the object must have correct data and the correct type
    await this.repository.updateItem(hackathon as Hackathon);
}

At the same time as I was working on this project, I have been learning Haskell, a powerful purely functional programming language. Since this post isn't meant to convince you to learn Haskell, I'll just introduce one powerful pattern which can be found in the language's base library: Either. Or, more specifically, Either a b. We'll discuss how this pattern can be introduced into Typescript, and how, with some setup and background, it can make data validation a lot simpler.

What is Either?

Essentially, Either is a type which can represent one of two other types. In Haskell, this idea is written as Either a b, where a and b represent the two other types. But only one type can be represented at a time. So, as its name suggests, at runtime, Either a b can only be a or b, but not both. Either Int String will either be an Integer or a String.

In order to determine which form Either is taking at any given time, the two options of types will be wrapped in a special value. In Haskell, these options are called Left and Right. So an Either Int String can be a Left Int or a Right String. In general, this pattern is known as a Tagged or Discriminated Union (Wikipedia). The two separate types have been combined into one type through the use of an object which "tags," or indicates, which type is in use.

In Haskell, the definition for Either takes the form of a general algebraic datatype:

data Either a b = Left a | Right b

Here, the vertical bar | refers to a logical OR, where, again, Either a b can be Left a OR Right b. We'll reuse this syntax when we write Either in Typescript.

The power of Either comes from its use in error handling. By convention, the Left type is the "error" type, and the Right type is the "value" type. As an Either value is passed through a program, operations are performed on the Right value. If an error occurs, the error's information can be "stored" in the Left type. The program will then continue, checking if an error is present, and passing the error's information along, performing no other computation in the process.

Therefore, a sequence of operations, such as data validation, can be written such that each validation step can throw its own error, and the first error found will be propagated through the operation sequence, rather than branching out from the normal logic of the program.

Either in Typescript

We can see that the Either pattern is really powerful just from its theoretical definitions. But can we write it in Typescript? Yes! Luckily, Typescript includes support for discriminated unions, as long as we write a few other methods which help the Typescript compiler infer which tagged type is actually in use. So let's write Either in Typescript.

First, we want to define interfaces which have the shared (tagged) property (also known as the "discriminant"). We'll need to leverage Generics, as well, so that any type can be held within our union objects. Since we are working with Left and Right, we'll make those our interface names, and we'll use two properties in each interface to create the structure of the union: value will hold the actual typed value of the object, and tag will purely refer to which type of container is in use.

interface Left<A> {
  value: A;
  tag: 'left'
}

interface Right<B> {
  value: B;
  tag: 'right'
}

(Both interfaces could have used A to refer to the generic type, but it can be confusing to see the same letter.)

Now that we have our separate interfaces, we need to declare a type alias which will refer to either Left or Right:

type Either<A,B> = Left<A> | Right<B>;

If we had written just Either<A>, we wouldn't have gotten the behavior we wanted: Both sides of the Either would have had to hold the same type, not two different types.

Finally, we can write the helper functions that Typescript requires to translate the tagged value into a type inference.

function isLeft<A>(val: any): val is Left<A> {
  if ((val as Left<A>).tag === 'left') return true;
  return false;
}

function isRight<B>(val: any): val is Right<B> {
  if ((val as Right<B>).tag === 'right') return true;
  return false;
}

These functions, simply put, cast their incoming value as a Left or Right, and then check the value of the tag field. The strange return value of val is Left<A> is the annotation for the compiler that, in the coming context, the type of val is Left<A>.

Finally, we're going to write some constructors for the Left and Right types. Whereas the interface definitions above tell us what a Left and Right value might look like, we can write a method which acts like a constructor to make creating these objects explicit:

function Left<A>(val: A) : Left<A> {
  return { value: val, tag: 'left' };
}

function Right<B>(val: B) : Right<B> {
  return { value: val, tag: 'right' };
}

When we wrote the interfaces above, we essentially defined a type called "Left" and "Right." Here, we are writing functions with the same name, and Typescript can figure it out because the function names and the type names are separate.

What does this have to do with Hackathons?

Let's actually put this together to do some data validation! Say that the only information we need about an error that occurs during validation is a string. Let's make a quick type alias to make that clear in our method signatures:

type MyError = string;

Super simple. Now, we can write the validateHackathon method from above, but using Either:

validateHackathon(h: Hackathon) : Either<MyError, Hackathon> {
  if (h.endDate < 0) {
    return Left<MyError>("End date cannot be negative!");
  }
  if (h.startDate < 0) {
    return Left<MyError>("Start date cannot be negative!");
  }
  if (h.startDate > h.endDate) {
    return Left<MyError>("Start date must be before end date!");
  }
  // etc
  return Right<Hackathon>(h);
}

You might be asking yourself, how can we return Left at one point and Right at another? This comes from the logical OR aspect of our definition of Either. Either can be a Left or a Right type, so as long as the return value is a Left OR Right, the type signature holds.

Also, notice here that we are requiring the incoming value to be of type Hackathon, whereas in the function above it was an any type and we casted to Hackathon at the end. Part of cleaning up the validation is separating the structure of the incoming data from any limits that we might have on its values. Validating the structure of the data can be something done with a JSON Schema and validator. Validating the limits that we have on the values of the incoming data is what will be addressed with our Either methods.

So, this method is interesting, but it isn't really that different from what we had before. Now we just have a funky method signature, and we use these Left and Right constructors instead of just throwing an error or returning a value. What's so special?

Creating Predicate functions

If we squint hard enough at our existing validation function, we can see that it has a repetitive structure: Using an if statement, we check some property of the incoming value. If the condition doesn't hold, we throw the corresponding error. We do this over and over again for different properties and their errors.

Any function which takes a value and returns true or false is called a predicate. Using Either, we can write a function that evaluates some object against the predicate, and if the predicate doesn't pass, the resulting Either takes the Left error form. We can call this method predicateEither. We'll also create a type alias for a predicate function, so I don't have to re-write these predicate signatures in each helper method signature:

type Predicate<N> = (val: N) => boolean;

function predicateEither<A, B>(value: B, error: A, predicate: Predicate<B>) : Either<A, B> {
    if (!predicate(value)) return Left(error);
    return Right(value);
}

So now, for example, we can validate on negative dates with a predicate:

const StartDateMustBePositive = (h: Hackathon) => h.startDate > 0;

let badHackathon : Hackathon = {
  name: "Bad",
  startDate: -10,
  endDate: -10
};

let result = predicateEither(badHackathon, "Start Date must be positive!", StartDateMustBePositive);

// Result = Left "Start Date must be positive!"

let goodHackathon : Hackathon = {
  name: "Good",
  startDate: 10,
  endDate: -10
};

result = predicateEither(goodHackathon, "Start Date must be positive!", StartDateMustBePositive);

// Result = Right (goodHackathon)

Notice that we don't need to include Generic type indicators anywhere because Typescript can fill in the blanks for us!

Combining Predicates

But wait, you might be saying. "Good Hackathon" isn't actually good, it still has a negative end date!

You're right, and so we should write another predicate function for that. But how do we combine that with the first predicate? We don't want to check the result value each time we use predicateEither, since then we might as well be doing manual error handling, and we'll create a lot of branches in our program:

const EndDateMustBePositive = (h: Hackathon) => h.endDate > 0;

function validateHackathon(h: Hackathon) : Either<MyError, Hackathon> {
  let result = predicateEither(h, "Start Date must be positive!", StartDateMustBePositive);
  if (isLeft(result)) return result; // Branch!
  result = predicateEither(h, "End Date must be positive!", EndDateMustBePositive);
  if (isLeft(result)) return result; // Repetitive!
  return result;
}

One of my favorite programming principles is DRY (Don't Repeat Yourself), and we are certainly violating that here. So let's write one final helper function which will make this whole endeavor worth it.

This method is called firstLeft. It takes an initial value, a list of predicates, and a list of errors. The value is tested against each predicate until one fails, in which case the corresponding error is returned. If no predicates fail, the value will be returned.

function firstLeft<A, B>(val: B, predicates: Predicate<B>[], errors: A[]) : Either<A, B> {
    for (let i = 0; i < predicates.length; i++) {
        let p = predicates[i];
        if (!p(val)) return Left(errors[i]);
    }
    return Right(val);
}

With this structure, we can create a list of predicates and their errors, and trust that the first error found will be the one that we are alerted to:

let predicates = [ StartDateMustBePositive, EndDateMustBePositive ];
let messages = [ "Start Date must be positive!", "End Date must be positive!" ];

function validateHackathon(h: Hackathon) : Either<MyError, Hackathon> {
    return firstLeft(h, predicates, messages);
}

async updateHackathon(h: Hackathon) : void {
    let result = validateHackathon(h);
    if (isLeft(result)) {
        console.error(result.value);
        return;
    }
    await this.repository.updateItem(h);
}

Dope! We've just transformed our repetitive, branching mess into a single line, and we've ensured that, at the first sign of a validation error, the original logic won't continue.

A "Spec" for Validation

I could stop here, but I want to change our firstLeft method just a bit. Having the predicates and messages as two separate arrays feels wrong; what if someone added a predicate but forgot to add a corresponding error message? The program would suddenly break on correct inputs due to indexOutOfBounds issues.

In this case I want to take advantage of tuples, or rather, what we have to use in place of tuples in Java-/Typescript. If we use a tuple-style object, we can effectively create a big list of predicates and their corresponding error messages. This big list can act as a "spec" for the object: any property that the object must satisfy can be found in the list.

Let's make a little "Pair" type and use it to create such a spec:

interface Pair<A,B> {
    first: A;
    second: B;
}

function firstLeft<A, B>(val: B, predicatePairs: Pair<Predicate<B>, A>[]): Either<A, B> {
    for (let i = 0; i < predicatePairs.length; i++) {
        let p = predicatePairs[i].first;
        let e = predicatePairs[i].second;
        if (!p(val)) return Left(e);
    }
    return Right(val);
}

const HackathonSpec : Pair<Predicate<Hackathon>, MyError>[] = [
 { first: StartDateMustBePositive, second: "Start Date must be positive!" },
 { first: EndDateMustBePositive,   second: "End Date must be positive!" }
];

function validateHackathon(h: Hackathon) : Either<MyError, Hackathon> {
    return firstLeft(h, HackathonSpec);
}

More complicated predicates

This pattern is really cool when you're using simple predicates, but business logic is hardly ever simple. How can we adapt this pattern for more complicated predicates, which require more than one input?

The answer is that we can write any kind of complex logic in our predicates, as long as we find a way to ensure they take one input and return a boolean. For example, in our internship project, we had to ensure that the dates for an incoming Hackathon didn't overlap with any existing Hackathon dates.

To test this predicate, we have to examine the incoming Hackathon against every other Hackathon. You might imagine that this would mean our predicate must have two inputs: (incomingHackathon: Hackathon, existingHackathons: Hackathon[]). But we can instead use closures to introduce the existing Hackathons inside of the predicate function:

class HackathonController {
    getAllHackathons(): Hackathon[];

    DatesMustNotOverlap = (h: Hackathon) => {
        return this.getAllHackathons()
                     .map<boolean>(v => v.endDate >= h.startDate 
                                     || v.startDate <= h.endDate )
                     .reduce((p, c) => p && c);
    };
    // etc
}

In Conclusion

Overall, using Either in this way creates a powerful pattern that allows for data validation steps to become much clearer and for their error messages to be more helpful. There are a lot of other things that can be done with Either, Pairs, and discriminated unions, which I hope to explore and discuss more in the future!

Footnote for those of you who know what you are talking about

I should say: I'm still very much new to Haskell and its powerful ideas, like Monads, Functors, Applicative, and Transformers. I'm still working on learning and fully understanding these ideas. Either is an interesting concept that I have found I can much more fully understand through implementation in Typescript (after all, Javascript was the first language I learned).

Because Typescript lacks a few powerful aspects of functional programming that truly elevate Either and other Monadic patterns to a new level (most notably partial function application), this implementation isn't nearly as powerful as Haskell's! But that's okay.

Top comments (8)

JLarky • Aug 27 '19 • Edited

It seems that you know Haskell but don't have too much experience with typescript :) like why would isLeft be defined on any instead of on Either<A, any>? Or why do you need Pair<A, B> defined as object when there's perfectly fine tuple type already in typescript [A, B]? :)

Evan Derby • Aug 27 '19

Learning & using Typescript has been like learning Sass: A cool extension to a language I already know, and whose features I discover a bit more of each time I use it. So I actually didn't know about the tuple type, thanks for pointing that out!
As for isLeft, I want to say that I saw that signature / pattern while I was researching discriminated unions and somehow thought it was required. Again, thanks for the tip.

JLarky • Aug 27 '19

you can see that person is just starting with typescript by amount of any in their code :) I was curious with idea though (because I come from language with pattern matching and I miss it) so I played with your code a bit and tried few ways of implementing it gist.github.com/JLarky/914006843b4...

Scott Simontis • Aug 27 '19 • Edited

Here is what I use in C#, I call it Option instead of Either but they're effectively the same thing. I don't think Typescript generics are complex enough to enable some of this code, but I hope it inspires you and you learn something new! Feel free to ask questions.

I couldn't get the LiquidTag working to share my Gist, so here is the link to it.

Evan Derby • Aug 27 '19

It's really neat to see how you implemented these ideas in a super object-oriented language like C#! Thanks for sharing!

Scott Simontis • Aug 27 '19

Also, if you like Haskell, I would encourage you to check out ReasonML and Elm as web programming languages. Elm was written by a language designer prodigy and the code itself is basically commented as if it were a textbook. ReasonML is OCaml that gets turned into JavaScript. They are both really fun, just not quite there for production use.

I have not played around with Haskell, F# and Ruby metaprogramming have been enough for me :) I would love to try it someday.

Also, great article! I enjoyed reading it and share in your frustration to express beautifully complex type systems in every language. The best way we can do is ensure we design objects that cannot be created already in an illegal state.