Vsevolod

Posted on Sep 27, 2023

Expanding Horizons

#typetheory #java

As stated at the end of the previous chapter, today we're going to abstract out the notion of an "absent value" even more. I'll use a bit of math with a mix of programming languages, mostly JVM-based as usual.

As a first step of generalizing null value, let's try to answer a simple question: is null really different to NullPointerException?

Firstly, let's introduce two singleton sets: $Null$ (set of $null$ value) and $NullPointerException$ (set of $nullPointerException$ object instance):

\begin{aligned} & Null = \lbrace null \rbrace \cr & NullPointerException = \lbrace nullPointerException \rbrace \end{aligned}

Next, let's create two functions:

\begin{aligned} & nullable: \lbrace * \rbrace \rightarrow U \cup Null \cr & npeable: \lbrace * \rbrace \rightarrow U \cup NullPointerException \end{aligned}

Re-written in Java:

<T> T nullable()
<T> T npeable() throws NullPointerException

While as far as Java is concerned these functions are not equal, from the mathematical standpoint we can state that the results of their computations form a bijection, which in turn means that the two sets under question are equivalent. In other words, they can be expressed in terms of each other without losing any information.

$npeable \rightarrow nullable$ :

<T> T nullable() {
    try {
        return npeable();
    } catch (NullPointerException e) {
        return null;
    }
}

$nullable \rightarrow npeable$ :

<T> T npeable() throws NullPointerException {
    var result = nullable();
    if (result == null) {
        throw new NullPointerException();
    }
    return result;
}

Now, since we proved that:

Null \equiv NullPointerException

And by definition:

NullPointerException \subset Exception

Then, each value in the $Null$ set corresponds to a value in the $Exception$ set:

\exist e \in Exception : null \equiv e

Which means, that we can consider null pointer as an exceptional return type. And this is completely rational! It is basically an error that says "Value not found". So, the difference between null and NPE is just a matter of programming style, application performance, and other non-mathematical things. From the pure logical view, these can be seen as equivalent things.

At this point, we generalized null. Subsequently, we can generalize Maybe. We will need one more utility function, though:

npe : U \cup Null \rightarrow U \cup NullPointerException

Which is definitely possible, since we can map $U$ to itself and we already know how to map $Null$ to $NullPointerException$ . Now we can state that "for each value in a nullable set, there will be a corresponding value in an exceptional set":

\forall a \in (U \cup Null), npe(a) \in (U \cup Exception)

And is there a monad that represents exceptional context?.. Yes, there is. Different languages have different naming for it. But I prefer to use Result<T,E>. Which either (pun intended) returns value T or error E. So, today we'll take a mental leap from Maybe to Result, from null pointer to any error.

The Bottom Part

Before getting into our target compiler-controlled safe territory, let's discuss the unsafe one. As the program executes, numerous different unexpected errors can happen. In order not to pollute our codebase, we generally do not try to cover all of those. Moreover, it's assumed that all return types always include those errors in themselves. This assumed value is named bottom and denoted as $\bot$ . It may sound now that bottom is a non-functional, evil thing. But it has its value.

To understand that, we can group errors by their importance in the specific scope. Simply speaking, some errors are important locally (i.e. at the caller site) and some globally (i.e. on the application/library level). For example, FileNotExists could be important locally: we know which file we tried to open, and we might know what to do if it doesn't exist. On the other hand we could always expect the file to be present, in that case the same exception becomes a global one. The best thing we could do is just re-throw it up the call stack. Without $\bot$ approach, we would have to add this meaningless exception to function signatures throughout our application, up to the main one. Code pollution without any value. Another example could be OutOfMemory error. Can you come up with a scenario when it's important locally? It would be some non-standard use case to say the least. Almost always, we assume that the caller can't handle it. In which case, why adding it to the signature at all? Moreover, exceptions of such kind theoretically can happen in any possible function. Does it mean that to be strict, we should annotate each function with it? Obviously not. Unchecked exceptions are useful for a clean separation of a happy path.

As a proof that this statement holds, we could take languages keen about checked errors, like Rust. Instead of wrapping checked exceptions in unchecked ones, rustaceans tend to annotate all possible errors and explicitly propagate them up to the entry point. However, even there tools to make this propagation as unnoticeable as possible are in high demand: language itself has an operator ? specifically to propagate error higher. Moreover, for applications those global errors are generally coerced to the umbrella Error type (analog of Java's Exception). Libraries that help to do this cleanly (for example, anyhow) are extremely popular. As a result, we have a codebase with, essentially, a cleaner version of throws Exception all over the place.

Returning to our muttons, NullPointerException makes no sense on the global level. The best thing an application can do globally is to show some generic error message to the user. On the other hand, we clearly know what and where we received locally. Thus, we are always sure, which pointer was missing and how to meaningfully handle it. That is the root reason why Optional is found to be so useful.

To be fair, I would mention that there is one specific error, that falls out of our handling approaches above, but, nevertheless, is a part of the bottom type. The infinite execution one. It differs drastically from the other ones, due to the fact that most of the time nor the compiler, nor the runtime signals about it. Developers can add some timeouts to handle it gracefully, but this is a completely different approach to error handling. Which is why it is out of our scope today.

Java's type system is not really a sophisticated one, so it doesn't arm us with the bottom type. However, there are two popular JVM languages that include it. Guess which ones? Instead of answering, I'll just leave two examples below.

@tailrec
def runsForever(): Nothing = runsForever()

fun alwaysThrows(): Nothing = throw RuntimeException()

Results in Jeneral

As already mentioned, foo() -> Result<String, Error> is isomorphic (mathematically equal) to String foo() throws Exception. Moreover, we can easily model the library APIs with both of them, enumerating all possible domain-specific errors. Why is then Results are beloved (in some circles at least) and checked exceptions hated (in most circles at least)?

The more popular and obvious part of the answer lies on the surface. As usual with a lot of things in programming, it's about sugaring our code (and eyes). As a Monad, Result in different languages can produce clean declarative pipelines, while checked exceptions in Java require imperative handling with catch blocks.

The bigger problem lies in the space of generics. Checked exceptions just aren't part of the generic system. Your T type is different to T throws CharacterCodingException and they both are different to T throws AccessDeniedException. The parent one to the last two is T throws IOException, which still can't be used where generic T is expected. As a quite radical answer to that, Java 8 lambdas doesn't allow checked exceptions as a return type. This in turn led to the fact, that nowadays most of the time checked exceptions are considered a bad practice in Java world. Compare that to Result, which is a lawful type system citizen. Thus, can be used as a generic return type wherever needed.

In Java, there is no built-in Result, but there is the mighty Try in the mighty Vavr library. It covers both problems discussed above. And is good enough for most application scenarios (i.e. scenarios similar to Rust anyhow::Result, scenarios where you just pass error up the stack). If, on the other hand, you are developing a library you can utilize Vavr's Either, which is good for modelling custom checked exceptions. The obvious problem with Either approach in external library lies in the fact, that your users will be forced to use Vavr, while they might prefer to use something else (those silly users, you know better what is good for them, right?).

In Scala, both Either and Try are built-in. Ten points to Gryffindor, as usual!

The Controversy

To be completely fair, I have to touch one debatable point between throws Exception and Result. The point with no clear winner. It is the fact that uncaught exceptions terminate the program, and unhandled results just lazily wait for someone to pick them up. This sounds like an easy win for the exceptions' party, since happy ignorance is never a good option. We should always be somehow notified about all unexpected errors, even if it means failing the entire execution. However, let's consider the same property for the scenario when we're processing sequences of data and require collecting all the outputs, be it values or errors. The lazy result is the only way to achieve this. The failing fast exceptions just never want to wait for us to finish, they blow up that stack. Therefore, even imperative languages are forced to collect some kind of union between values and processing errors. For example, you might have seen something like this:

var inputs, outputs, errors;
for (var input : inputs) {
  try {
    outputs.add(
      process(input)
    );
  } catch (Exception e) {
    errors.add(e);
  }
}

Here process might throw an exception, but we don't want to fail fast, we want to process every item in the list before deciding what to do next. Even though such code doesn't use some kind of explicit result structure, semantically it creates the same unions split in two separate lists (i.e. all successes to outputs and all failures to errors). By showing this example, I'm underlining that fail-fast nature is not a 100% win. Thus, the controversy.

The Union Way

To be consistent across chapters, I would try to model Result as a proper mathematical union. Firstly, this will help us with backward compatibility described in chapter one. Secondly, it is just interesting for me to see how it will work out.

Here is Try-like implementation:

extension[A: Typeable] (m: A | Throwable)
 def map[B](f: A => B): B | Throwable = m match
   case e: Throwable => e
   case a: A => f(a)

 def flatMap[B](f: (=> A) => B | Throwable): B | Throwable = m match
   case e: Throwable => e
   case a: A => f(a)

 def withFilter(f: A => Boolean): A | Throwable = m match
   case e: Throwable => e
   case a: A => if f(a) then a else Exception("Value filtered")

We can update our example functions from chapter two to return exceptions instead of nulls:

def fetchEntityFromDb(id: String): Entity | EntityNotFoundException
def populateMeta(entity: Entity): EntityWithMeta | AccessDeniedException

After that, our old for-comprehension will work. The first problem arises when we add our Maybe union to the equation:

|An extension method was tried, but could not be fully constructed:
|
|    flatMap(fetchEntityFromDb(id))    failed with
|
|        Ambiguous overload. The overloaded alternatives of method flatMap with types
|         [A](m: A | Throwable)[B](f: (=> A) => B | Throwable)(implicit evidence$2: scala.reflect.Typeable[A]): B | Throwable
|         [A](m: A | Null)[B](f: (=> A) => B | Null): B | Null
|        both match arguments (Entity | Null)(<?> => <?>)

Obviously, we need single extension for flatMap and family:

extension[A: Typeable] (m: A | Throwable | Null)
  def map[B](f: A => B): B | Throwable | Null = m match
    case null => null
    case e: Throwable => e
    case a: A => f(a)

  def flatMap[B](f: A => B | Throwable | Null): B | Throwable | Null = m match
    case null => null
    case e: Throwable => e
    case a: A => f(a)

  def withFilter(f: A => Boolean): A | Throwable | Null = m match
    case null => null
    case e: Throwable => e
    case a: A => if f(a) then a else null

Now, for-comprehension works for both Maybe and Try scenarios. For example, it could be:

def fetchEntityFromDb(id: String): Entity | Null
def populateMeta(entity: Entity): EntityWithMeta | AccessDeniedException

The harder problem for me was to simulate Either case instead of a Try. I mean, the case when we want enumeration of possible known errors, instead of a generic Throwable. Intuitively, I tried something like this:

extension[A: Typeable, E <: Throwable : Typeable] (m: A | E)
 def map[B](f: A => B): B | E = m match
   case e: E => e
   case a: A => f(a)

 def flatMap[B](f: A => (B | E)): B | E = m match
   case e: E => e
   case a: A => f(a)

We have generic sides of a union and the right side bounded by Throwable. The idea is that the right side could take any specific error (or enumeration of ones).

The first thing to note is that by generifying error type, we automatically lost withFilter method. And that's expected. There can be no mempty/mzero for Either, since we, as library developers, can't know the type user expects beforehand (i.e. we can't supply a default value). To get a more thorough explanation of the matter, search for Either and MonadPlus relations.

The other problem was less expected for me. Let's say we have a custom exception hierarchy:

sealed class CustomException(message: String) extends Exception(message)

case class EntityNotFoundException(id: String) extends CustomException(id)

case class AccessDeniedException(resource: String) extends CustomException(resource)

And, again, the same functions which return those exceptions:

def fetchEntityFromDb(id: String): Entity | EntityNotFoundException =
  Entity(id)

def populateMeta(entity: Entity): EntityWithMeta | AccessDeniedException =
 EntityWithMeta(entity.id)

def toDto(entity: EntityWithMeta): Dto = Dto(entity.id)

Then our for-comprehension (without if this time):

def fetchDto(id: String) = for {
  entity <- fetchEntityFromDb(id)
  entityWithMeta <- populateMeta(entity)
  dto <- toDto(entityWithMeta)
} yield dto

Yields the following compiler error:

|  entityWithMeta <- populateMeta(entity)
|                                 ^^^^^^
|                                 Found:    (entity : Entity | EntityNotFoundException)
|                                 Required: Entity

It seems, that compiler unpacks the whole union as our A (aka, Left) and passes it further down the chain. Instead, we wanted it to substitute Entity for A and EntityNotFoundException for E. I've spent some time trying to fix it to no avail. Although, I admit that my Scala knowledge is quite superficial. So, I would like to hear propositions/solutions to this error in comments below.

All in all, I would state that my attempt at unionizing types felt quite unsatisfactory.

Refining Further

The classical non-union Results have their cost. Foremost, it's a mental price of monadic code usage. Haskellers would argue here, but nevertheless this is a cost for common imperative Java developers. And even for functional guys out there, the application runtime pays the price of wrapping the values for both happy and faulty scenarios. Can we do better?

Unchecked exceptions is the efficient answer in a lot of languages. For use cases, where in most attempts a function will not fail, we're optimizing happy path with exceptions. Which sounds reasonable. Also, as discussed previously, exceptions fail fast and force us to fix the root cause. Let's take a simple head function as an example:

head :: [a] -> a
head [] = error "Empty list"
head (x:_) = x

In Scala, the same implementation with a slightly better tooling support is achieved through PartialFunction:

def head[A]: PartialFunction[Seq[A], A] = {
  case xs if xs.nonEmpty => xs(0)
}

I'm saying "slightly better", because for happy path it could be called as usual function head(List("John")), but for error handling it has several utility methods, like head.applyOrElse(List(), _ => "default"), which is arguably better than try/catch. Also, PartialFunction could be considered a specific variant of a Result, it even provides a method lift to prove their affinity.

In Java, the above example conventionally would look like this:

<T> T head(@NotEmpty List<T> xs) {
  return xs.get(0);
}

Note the NotEmpty part, which adds expressiveness to our errors. We'll return to it shortly.

All these examples add a contract to a function. The thing is that instead of making developers remember to abide to this contract, it can be checked by a compiler. There are different examples of compilers that can do that. But I will give you a taste of a specific one, GHC plugin called LiquidHaskell:

{-@ head :: { v:[a] | len v > 0 } -> a @-}
head (x:_) = x

The predicate len v > 0 is checked at a compile time. Thus, we're forced to either prove that our argument adheres to contract or propagate this contract up the call hierarchy. The predicate refines our type, thus this tool is called refinement types. Please, check out more examples out in the wild.

This propagation is an extremely powerful property. Imagine a common use case of an HTTP endpoint accepting JSON value with some array, for which we set a rule to be non-empty:

data class Input(
    @NotEmpty val names: List<String>,
)

With this reflection-based approach, we lose our knowledge about the real type of names right after successful deserialization. However, usually this contract is actually needed somewhere inside the layers of a service. For example, in a database access layer. Let's imagine that Kotlin compiler adds refinement types someday:

data class Input(
    val names: List<String> { it.isNotEmpty() },
)

The notion that we received a non-empty list will be engraved by the compiler as far down the call stack as it is required. The necessary pre-condition for this to work, is that JSON marshaller does respect the type system. But if it's a standard language feature, chances for tools' support are several degrees higher. Yes-yes, I'm looking at you right now, refined!

I have touched refinement types, because they are actually part of our "absent values" puzzle. Results are about post-processing, while type refinement is about pre-processing, which can lead to a cleaner and more strongly constrained code. Despite all those benefits, type refinement has its limitations. One being lesser adoption, which is fixable in the long-run. Another one is a conceptual limitation of purity. Obviously, we can't execute IO during compilation. E.g. can you imagine getting a database record from a production environment during compilation on your local machine?.. We need our beloved Maybe for that one for sure.

This was planned as the last part of the series. The chapter became bigger than the previous ones, but that is kind of expected from the one named "expanding horizons", right? All in all, we've covered a broad ground of error handling approaches from the bottom up the compilation ladder. And if you read up to this point, it means that the reading was probably somewhat interesting and hopefully somewhat useful for you (or you're just my friend or relative, who is trying not to offend me). Anyway, thanks for reading, hoping we'll meet again!