Matt Thornton

Posted on Apr 16, 2021

Grokking Applicative Validation

#fsharp #functional #programming #grokking

Previously in Grokking Applicatives we discovered Applicatives and more specifically invented the apply function. We did this by considering the example of validating the fields of a credit card. The apply function allowed us to easily combine the results we obtained from validating the number, expiry and CVV individually into a Result<CreditCard> that represented the validation status of an entire CreditCard. You might also remember we somewhat glossed over the error handling when multiple fields were invalid. We took the easy road and just returned the first error that occurred.

An unhappy customer 😡

In the spirit of agile we decided to ship our previous implementation, because, well it was better than nothing. A short while later, customers start complaining. All the complaints are along these lines.

"I entered my credit card details on your site and it took three attempts before it was finally accepted. I submitted the form and each time it gave me a new error. Why couldn't you tell me about all the errors at once?"

To see this more clearly consider a customer that enters the following data, in JSON form.

{
    “number”: “a bad number”,
    “expiry”: “invalid expiry”,
    “cvv”: “not a CVV”
}

The first time they submit the form they get an error like ”’a bad number’ is not a valid credit card number”. So they fix that and resubmit. Then they get a message like ”’invalid expiry’ is not a valid expiry date”. So they fix that and submit a third time and still receive an error along the lines of ”’not a CVV’ is not a valid CVV”. Pretty annoying!

We should be able to do better and return all of the errors at once. We even previously pointed out that all of the field level validation functions were independent of each other. So there was no good reason not to run all of the functions and aggregate the errors if there were any, we were just being lazy!

A better validation Applicative 💪

Let's start by updating the signature of validateCreditCard to signify our new desire to return all the validation errors that we find.

let validateCreditCard (card: CreditCard): Result<CreditCard, string list>

The only change here is that we’re now returning a list of error messages rather than a single one. How should we update our implementation to satisfy this new signature?

Let’s return to the apply function that we defined before and see if we can just fix it there. It would be very nice if all we had to do was modify apply and leave validateCreditCard otherwise unchanged.

For reference here’s the apply function that we wrote last time, the one that returns the first error it encounters.

let apply a f =
    match f, a with
    | Ok g, Ok x -> g x |> Ok
    | Error e, Ok _ -> e |> Error
    | Ok _, Error e -> e |> Error
    | Error e1, Error _ -> e1 |> Error

We can see from this that it’s only the final case where we have multiple errors to deal with and so it’s only there that we need to fix things. The simplest fix is to just concatenate both errors. This has the effect of building up a list of errors each time we call apply with invalid data. Let’s see what that looks like then.

let apply a f =
    match f, a with
    | Ok g, Ok x -> g x |> Ok
    | Error e, Ok _ -> e |> Error
    | Ok _, Error e -> e |> Error
    | Error e1, Error e2 -> (e1 @ e2) |> Error

That was easy, we just used @ to concatenate the two lists in the case where both sides were Error. Everything else remained the same.

Let’s walk through validating the credit card step-by-step with the example of the bad data that the customer was supplying earlier. First we call Ok (createCreditCard) |> apply (validateNumber card.Number). This hits the third case of the pattern match in apply because f is Ok, but the argument a is Error. That returns us something like an Error [ “Invalid number” ], but whose type is still Result<string -> string -> CreditCard, string list>.

We then pipe this like |> apply (validateExpiry card.Expiry). This hits the final case in the pattern match because now both f and a are Error. This means the @ operator is used to concat the errors together to create something like Error [ “Invalid expiry”; “Invalid number” ]. The type of which is now Result<string -> CreditCard, string list> because we now just need to supply a CVV to finish creating the CreditCard.

So in the final step we do exactly that and pipe this result like |> apply (validateCvv card.Cvv). Just like the last step we hit the case where both f and a are Error and so we concat them. Now we’ve got something with the type Result<CreditCard, string list> as we wanted with a value like Error [ “Invalid CVV”; “Invalid expiry”; “Invalid number” ].

A small compile time error

You might have spotted that we’ve actually changed the type of the apply function now. By using the @ operator F# has inferred that the errors must be a list. So now the signature of apply is Result<T, E list> -> Result<T -> V, E list> -> Result<V, E list>.

We now have an apply that works for Result<T, E list>. That is, it works for any results where the errors are contained in a list, rather than being single values like a string. There are a couple of interesting points to make about this:

The errors in the list can be any type, providing they’re all of the same type.
All of our validated results must now have a list of errors if we want to use them with apply.

Point 1 is useful because it allows us to model our errors in more meaningful ways than just using strings. Although for the rest of this post we’ll keep using string in order to keep it simple. Modelling errors deserves a blog post of its own.

Point 2 however causes us a little problem we have to solve here. Our original field level validation functions are still returning Result<string, string> so they no longer work with our new version of apply.

We have two choices when it comes to fixing this issue. We could keep the functions as they are and transform their outputs by wrapping the error, if it exists, of the result in a list. Which might look something like this.

let validateCreditCard (card: CreditCard): Result<CreditCard, string list> =
    let liftError result =
        match result with
        | Ok x -> Ok x
        | Error e -> Error [ e ]
    Ok (createCreditCard)
    |> apply (card.Number |> validateNumber |> liftError)
    |> apply (card.Expiry |> validateExpiry |> liftError)
    |> apply (card.Cvv |> validateCvv |> liftError)

The other choice is to update those field validation functions so that they return Result<string, string list> as required. It might be tempting to take the first choice and if we had no control over those functions we’d have to do that. However, by letting those field level functions return a list we give them the flexibility to do more complex validation and potentially indicate multiple errors.

For instance the validateNumber function could indicate both a problem with the length and the presence of invalid characters like this.

let validateNumber number: Result<string, string list> =
    let errors = 
        if String.length num > 16 then
            [ "Too long" ]
        else
            []
    let errors = 
        if num |> Seq.forall Char.IsDigit then
            errors
        else 
            "Invalid characters" :: errors

    if errors |> Seq.isEmpty then
        Ok num
    else
        Error errors

Using Result<T, E list> throughout gives us a more composable and flexible api that allows us to refactor the errors returned from those functions in the future without affecting the rest of the program.

So given that they’re functions in our domain, then we’ll take that approach. Let’s give that a try and see what it looks like all together when using this new version of apply.

let validateNumber num: Result<string, string list> =
    if String.length num > 16 then
        Error [ “Too long” ]
    else
        Ok num

let validateExpiry expiry: Result<string, string list> =
    // validate expiry and return all errors we find

let validateCvv cvv: Result<string, string list> =
    // validate cvv and return all cvv errors we find

let validateCreditCard (card: CreditCard): Result<CreditCard, string list> =
    Ok (createCreditCard)
    |> apply (validateNumber card.Number)
    |> apply (validateExpiry card.Expiry)
    |> apply (validateCvv card.Cvv)

Lovely job! Apart from a couple of small changes to lift the errors up into lists within validateNumber etc the rest has stayed the same. In particular, the body of validateCreditCard is completely unchanged.

Do I have to use list for the errors

The only requirement we’ve placed on the error type is that we can use the @ operator to concat the errors together. So as long as the errors are concat-able then we can use a different type here. The fancy category theory name for this is a semi-group. A semi-group is anything that has at least a concat operator defined for it. A common type to use here is a NonEmptyList, because we know that if the result is an Error then they’ll be at least one item in the list.

A tale of two applicatives 📗

We’ve seen two implementations of apply for Result now. Can we have both? Unfortunately not really, at least not both defined in the Result module of F#. In order to do this F# would have to be able to decide which one to use based on whether the error type supported concat, which might not even be obvious without an explicit type annotation. Even then we might get undesired results because strings support concat, but it’s unlikely we want to concat the individual error messages into one long string.

How should we decide which one is correct then? Well, we don’t have to. We can define another type called Validation which has a Success case and a Failure case, similar to Ok and Error for Result. The difference is that for Validation we can define apply using the version we’ve created in this post which accumulates errors and for Result use the apply function that short circuits and returns the first error, that we saw in the last post. Luckily for us the excellent FSharpPlus library has already done exactly that.

What did we learn 🧑‍🏫

We’ve seen that applicatives are a great tool to have at our disposal when writing validation code. They allow us to write validation functions for each field and then easily compose these to create functions for validating larger structures made up of those fields.

We’ve also seen that whilst applicative computations are usually independent of each other there’s nothing to guarantee that a particular implementation of apply will make full use of this. Specifically, when working with validation we want to make sure that apply accumulates all errors and so we should make sure to use a type like Validation from FSharpPlus to get this behaviour.

Oldest comments (6)

Deyan Petrov • Apr 20 '21 • Edited

I am not sure it is really a good idea to change the individual validation functions to return a list if errors when they return a single error. If one day one of the functions needs to return 2 errors, then (and only then) I will change its signature, but not earlier. Why are you (and others) so relaxed about (wrong) function signatures in this case?

Matt Thornton • Apr 20 '21

Hi Deyanp, I don’t think there is a universally “right” signature for representing the errors returned from a validation function. Like I point out above you can either leave it as returning a single error and the do the conversion inside the parent validation function to map it into a list of one error so the types line up with apply, or you can decide to refactor the field level function to return a list. You should do whatever makes most sense for your situation. However, I’ve personally found that by being consistent in always returning a list of errors then it makes the code more consistent, easier to compose and less fragile in the face of future changes. On a more philosophical note, a single error is just a special case of the more general situation whereby there might be several things disjoint things wrong with the data. Therefore I think a list of errors is actually the more natural representation. I’m struggling to think of a case whereby I really need to enforce some semantics that only one error could ever be returned at a time. I guess you can make the YAGNI argument against prematurely converting to a list but the change seems fairly localised here.

Deyan Petrov • Apr 21 '21

Hi Matt,

I still cannot get my head around why a list of errors would be the more natural representation ....

List can be regarded as an "Effect" as per Scott Wlaschin's "Effect World". Why would you change the signature of a function to return a List intead of single string, when it really validates a single thing?

In my functions I am trying to have the most correct and minimal/most primitive types in the signature. I do not have my functions return an Option if they don't need to. I also do not return Result if not needed. Actually, when I look at a function and see that it has a Result return type but only returns Ok, then I go and change the signature and remove the Result.

This is obviously philosophical and in no means challenging your excellent (series of) article(s), I am just trying to wrap my head around the question why you and so many other people seem to be relaxed about function signatures when it comes to applicative validation ... I would Result.mapError List.singleton in the orchestration function ...

Best regards,
Deyan

Matt Thornton • Apr 21 '21 • Edited

Don't worry about the critique, I like the discussion, it's part of the reason I wanted to start posting on here.

I think there are cases when having a function return a degenerate value is fine. For example, if I was writing C# and I was implementing an interface that required me to return Async<string>, but in my particular implementation I didn't need to do any async work then I would just implement with Task.FromResult("not async"). So viewed from that angle then you could think of Result<'a, 'e list> as the most permissive interface that a validation function could have. Therefore by using this consistently it allows looser coupling between the validation functions working at different levels of the data.

You said:

List can be regarded as an "Effect" as per Scott Wlaschin's "Effect World". Why would you change the signature of a function to return a List intead of single string, when it really validates a single thing?

It's worth noting that whilst we're validating a single piece of data, it's not the Ok value that we're turning into a list here, it's the Error case, so what the signature is really saying is that there might be multiple validation errors for this single piece of data.

As for lists being effects, that is certainly one way to interpret them. If I'm not mistaken it represents the effect of running several computations. A Result is also an effect, one that represents a computation that might fail. So from that viewpoint then a Result<'a, 'e list> could be interpreted as "this function will run several validation computations on this single piece of data and if they're all fine it will just return the single piece of data otherwise it will return the outputs from all of the failed validation computations". That to me sounds like quite a general statement about how validation works and therefore makes for a type that can encapsulate many different validation computations.

Deyan Petrov • Apr 21 '21

Thanks for your patience, Matt! I think it will be over after you read the below ;)

I understand that you want to extend the output set of possible values of the function in order to generalize its interface to theoretically handle more than 1 error.

The fact of the matter is that the current version of the function(s) is returning only a single error though, and a Result<_, string list> may not be really justified by the function when looking at it in isolation.

The Result<_, string list> is solely required so that this function can be used in a parent/"orchestration" function utilizing applicative validation. So the "innocent" function knows this parent/orchestration context now ...

Let me ask you a similar question - imagine you have an orchestration/workflow function returning Result<_, OrchestrFunctionErrorDU>, and invoking 2 simple/reusable in different context "worker" functions. Which variant of the 2 below would you choose?

OPTION 1

let worker1a (someParam:bool) : Result<string, string> = 
    if someParam then Ok "all good1" else Error "sth wrong1a"

let worker1b (someParam:bool) : Result<string, string> = 
    if someParam then Ok "all good2" else Error "sth wrong1b"

type OrchestrFunctionErrorDU =
    | HighLevelError1 of string
    | HighLevelError2 of string

let orchestrator1 () : Result<string, OrchestrFunctionErrorDU> = 
    result {
       let! x = worker1a true |> Result.mapError OrchestrFunctionErrorDU.HighLevelError1
       let! y = worker1b false |> Result.mapError OrchestrFunctionErrorDU.HighLevelError2
       return "all good"
    }

OPTION 2

type OrchestrFunctionErrorDU =
    | HighLevelError1 of string
    | HighLevelError2 of string

let worker2a (someParam:bool) : Result<string, OrchestrFunctionErrorDU> = 
    if someParam then Ok "all good1" else "sth wrong2a" |> OrchestrFunctionErrorDU.HighLevelError1 |>Error

let worker2b (someParam:bool) : Result<string, OrchestrFunctionErrorDU> = 
    if someParam then Ok "all good2" else "sth wrong2b" |> OrchestrFunctionErrorDU.HighLevelError2 |>Error

let orchestrator2 () : Result<string, OrchestrFunctionErrorDU> = 
    result {
       let! x = worker2a true
       let! y = worker2b false
       return "all good"
    }

The difference is that in Option 1 the worker functions do not know about the higher level context and its error DU, and return some primitive error type (in this case string). In Option 2 they do know about the higher-level error DU, and they use it by pretending (imho) to be able to return both error cases, but of course returning only one of them.

I think you would choose option 2 ... I would choose option 1 but still trying to understand why people would choose option 2 ...

P.S. if you need the result CE (from Scott Wlaschin) to make the above work in fsx here it is:

//==============================================
// Computation Expression for Result
//==============================================

[<AutoOpen>]
module ResultComputationExpression =

    type ResultBuilder() =
        member __.Return(x) = Ok x
        member __.Bind(x, f) = Result.bind f x

        member __.ReturnFrom(x) = x
        member this.Zero() = this.Return ()

        member __.Delay(f) = f
        member __.Run(f) = f()

        member this.While(guard, body) =
            if not (guard()) 
            then this.Zero() 
            else this.Bind( body(), fun () -> 
                this.While(guard, body))  

        member this.TryWith(body, handler) =
            try this.ReturnFrom(body())
            with e -> handler e

        member this.TryFinally(body, compensation) =
            try this.ReturnFrom(body())
            finally compensation() 

        member this.Using(disposable:#System.IDisposable, body) =
            let body' = fun () -> body disposable
            this.TryFinally(body', fun () -> 
                match disposable with 
                    | null -> () 
                    | disp -> disp.Dispose())

        member this.For(sequence:seq<_>, body) =
            this.Using(sequence.GetEnumerator(),fun enum -> 
                this.While(enum.MoveNext, 
                    this.Delay(fun () -> body enum.Current)))

        member this.Combine (a,b) = 
            this.Bind(a, fun () -> b())

    let result = new ResultBuilder()

Matt Thornton • Apr 21 '21 • Edited

Yes, in this case I would take option1, because OrchestrFunctionErrorDU has nothing to do with the lower level functions. However, I don't think this is the same case, although it might appear to be.

My primary motivation for choosing to use Result<'a, 'e list> is not because the parent function needs an error list in order for apply to work, but instead because it's a good representation of a validation computation in its own right. The argument being that when validating any data it's not unreasonable to expect to encounter several errors.

It just also happens that in this case it does make the parent composition easier too, but that's a secondary reason for doing it that way.

If I were actually modelling the errors then I would likely use a DU to describe the errors cases for each field and then unify those with a DU at the parent level (similar to your example). In that case I would only lift those errors into the parent DU within the parent function, because as you rightly say the child validation function should have no knowledge of that parent level DU type.

So I normally expect to do some error mapping in the parent function, but in the case of whether or not to return a list of errors I choose to use a list in the child function because above all else I believe it is a better api for that function, which allows that function to evolve more independently in the future.

I guess another way to think about this is the locality of any future changes. If I were to change the validateChild function in the future and it now started returning several errors instead of one I would be forced to also go and fix validateParent which called that. However, in the scenario where I'm unifying errors in parent ParentErrorDU type then that is something that is defined at the same abstraction level as validateParent (probably in the same F# module) and so if I change that type then I'm going to have to fix the error mappings in validateParent but I'm OK with that because that is a change at the same level of abstraction. I don't however have to go and change any of the child validation functions just because ParentErrorDU changed.

So by returning a list of errors from the child and doing any parent level DU mapping in the parent we've achieved high cohesion and low coupling even though they look like they're contradictory design choices.