Christian Neumanns

Posted on Oct 2, 2019 • Edited on Oct 18, 2019 • Originally published at practical-programming.org

Null-Safety vs Maybe/Option - A Thorough Comparison (Part 1/2)

#null #maybe #option

Introduction

There are two effective approaches to eliminate the daunting null pointer error:

The Maybe/Option pattern - mostly used in functional programming languages.
Compile-time null-safety - used in some modern programming languages.

This article aims to answer the following questions:

How does it work? How do these two approaches eliminate the null pointer error?
How are they used in practice?
How do they differ?

Notes:

Readers not familiar with the concept of null might want to read first: A quick and thorough guide to 'null'.
For an introduction to Maybe / Option I recommend: F#: The Option type. You can also search the net for "haskell maybe" or "f# option".

Why Should We Care?

"I call it my billion-dollar mistake. It was the invention of the null reference in 1965. ... This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years. ..."

-- Tony Hoare

In the context of Java, Professor John Sargeant from the Manchester school of computer science puts it like this:

"Of the things which can go wrong at runtime in Java programs, null pointer exceptions are by far the most common."

-- John Sargeant

We can easily deduce:

"By eliminating the infamous null pointer error, we eliminate one of the most frequent reasons for software failures."

That's a big deal!

We should care about it.

Three Approaches

Besides showing the reason for the null pointer error, this article also aims to demonstrate how the null pointer error can be eliminated.

We will therefore compare three different approaches:

The language uses null, but doesn't provide null-safety.

In these languages null pointer errors occur frequently.

Most popular languages fall into this category. For example: C, C++, Java, Javascript, PHP, Python, Ruby, Visual Basic.
The language doesn't support null, but uses Maybe (also called Option or Optional) to represent the 'absence of a value'.

As null is not supported, there are no null pointer errors.

This approach is mostly used in some functional programming languages. But it can as well be used in non-functional languages.

At the time of writing, the most prominent languages using this approach are probably Haskell, F#, and Swift.
The language uses null and provides compile-time-null-safety.

Null pointer errors cannot occur.

Some modern languages support this approach.

Source Code Examples

In this chapter we'll look at some source code examples of common use cases involving 'the absence of a value'. We will compare the code written in the three following languages representing the three approaches mentioned in the previous chapter:

Java (supports null, but not null-safe)

Java is one of the industry's leading languages, and one of the most successful ones in the history of programming languages. But it isn't null-safe. Hence, it is well suited to demonstrate the problem of the null pointer error.
Haskell (Maybe type)

Haskell is the most famous one in the category of pure functional languages. It doesn't support null. Instead it uses the Maybe monad to represent the 'absence of a value'.

Note: I am by no means a Haskell expert. If you see any mistake or need for improvement in the following examples, then please leave a comment so that the article can be updated.
PPL (supports null and is null-safe)

The Practical Programming Language (PPL) supports null and has been designed with full support for compile-time-null-safety from the ground up. However, be warned! PPL is just a work in progress, not ready yet to write mission-critical enterprise applications. I use it in this article because (full disclosure!) I am the creator of PPL, and I want to initiate some interest for it. I hope you don't mind - after reading this article.

All source code examples are available on Github. The Github source code files contain alternative solutions for some examples, not shown in this article.

Null-Safety

How does null-safety work in practice? Let's see.

Null Not Allowed

We start with an example of code where null is not allowed.

Say we want to write a very simple function that takes a positive integer and returns a string. Neither the input nor the output can be null. If the input value is 1, we return "one". If it is not 1, we return "not one". How does the code look like in the three languages? And, more importantly, how safe is it?

Java

This is the function written in Java:

static String intToString ( Integer i ) {
    if ( i == 1 ) {
        return "one";
    } else {
        return "not one";
    }
}

We can use the ternary operator and shorten the code a bit:

static String intToString ( Integer i ) {
    return i == 1 ? "one" : "not one";
}

Note: I am using type Integer, which is a reference type. I am not using type int, which is a value type. The reason is that null works only with reference types.

To test the code, we can write a simple Java application like this:

public class NullNotAllowedTest {

    static String intToString ( Integer i ) {
        return i == 1 ? "one" : "not one";
    }

    public static void main ( String[] args ) {
        System.out.println ( intToString ( 1 ) );
        System.out.println ( intToString ( 2 ) );
    }
}

If you want to try out this code you can use an online Java Executor like this one. Just copy/paste the above code in the Source File tab, and click Execute. It looks like this:

If you have Java installed on your system, you can also proceed like this:

Save the above code in file NullNotAllowedTest.java.
Compile and run it by typing the following two commands in a terminal:
```
javac NullNotAllowedTest.java
java NullNotAllowedTest
```

The output written to the OS out device is:

one
not one

So far so good.

Haskell

In Haskell, there are a few ways to write the function. For example:

intToString :: Integer -> String
intToString i = case i of
    1 -> "one"
    _ -> "not one"

Note: The first line in the above code could be omitted, because Haskell supports type inference for function arguments. However, it's considered good style to include the type signature, because it makes the code more readable. Hence, we will always include the type signature in the upcoming Haskell examples.

The above code uses pattern matching, which is the idiomatic way to write code in Haskell.

We can write a simple Haskell application to test the code:

intToString :: Integer -> String
intToString i = case i of
    1 -> "one"
    _ -> "not one"

main :: IO ()
main = do
    putStrLn $ intToString 1
    putStrLn $ intToString 2

As for Java, you can use an online Haskell executor to try out the code. Here is a screenshot:

Alternatively, if Haskell is installed on your system, you can save the above code in file NothingNotAllowedTest.hs. Then you can compile and run it with these two commands:

ghc -o NothingNotAllowedTest NothingNotAllowedTest.hs
NothingNotAllowedTest.exe

The output is the same as in the Java version:

one
not one

PPL

In PPL the function can be written like this:

function int_to_string ( i pos_32 ) -> string
    if i =v 1 then
        return "one"
    else
        return "not one"
    .
.

Note: The comparison operator =v in the above code is suffixed with a v to make it clear we are comparing values. If we wanted to compare references, we would use operator =r.

We can shorten the code by using an if-then-else expression (instead of an if-then-else statement):

function int_to_string ( i pos_32 ) -> string = \
    if i =v 1 then "one" else "not one"

A simple PPL application to test the code looks like this:

function int_to_string ( i pos_32 ) -> string = \
    if i =v 1 then "one" else "not one"

function start
    write_line ( int_to_string ( 1 ) )
    write_line ( int_to_string ( 2 ) )
.

At the time of writing there is no online PPL executor available. To try out code you have to install PPL and then proceed like this:

Save the above code in file null_not_allowed_test.ppl
Compile and run the code in a terminal by typing:
```
ppl null_not_allowed_test.ppl
```

Again, the output is:

one
not one

Discussion

As we have seen (and expected), the three languages allow us to write 'code that works correctly'. Here is a reprint of the three versions, so that you can easily compare the three versions:

Java

static String intToString ( Integer i ) {
    return i == 1 ? "one" : "not one";
}

Haskell

intToString :: Integer -> String
intToString i = case i of
    1 -> "one"
    _ -> "not one"

PPL

function int_to_string ( i pos_32 ) -> string = \
    if i =v 1 then "one" else "not one"

A pivotal question remains unanswered:

"What happens in case of a bug in the source code?"

-- The Crucial Question

In the context of this article we want to know: What happens if the function is called with null as input? And what if the function returns null?

This question is easy to answer in the Haskell world. null doesn't exist in Haskell. Haskell uses the Maybe monad to represent the 'absence of a value'. We will soon see how this works. Hence, in Haskell it is not possible to call intToString with a null as input. And we can't write code that returns null.

PPL supports null, unlike Haskell. However, all types are non-null by default. This is a fundamental rule in all effective null-safe languages. A PPL function with the type signature pos_32 -> string states that the function cannot be called with null as input, and it cannot return null. This is enforced at compile-time, so we are on the safe side. Code like int_to_string ( null ) simply doesn't compile.

"By default all types are non-null in a null-safe language."

"By default it is illegal to assign null."

-- The 'non-null by default' rule

What about Java?

Java is not null-safe. Every type is nullable, and there is no way to specify a non-null type for a reference. This means that intToString can be called with null as input. Moreover, nothing prevents us from writing code that returns null from intToString.

So, what happens if we make a function call like intToString ( null )? The program compiles, but the disreputable NullPointerException is thrown at run-time:

Exception in thread "main" java.lang.NullPointerException
    at NullNotAllowedTest.intToString(NullNotAllowedTest.java:4)
    at NullNotAllowedTest.main(NullNotAllowedTest.java:10)

Why? The test i == 1 is equivalent to i.compareTo ( new Integer(1) ). But i is null in our case. And executing a method on a null object is impossible and generates a NullPointerException.

This is the well-known reason for the infamous billion-dollar mistake.

What if intToString accidentally returns null, as in the following code:

public class NullNotAllowedTest {

    static String intToString ( Integer i ) {
        return null;
    }

    public static void main ( String[] args ) {
        System.out.println ( intToString ( 1 ) );
    }
}

Again, no compiler error. But a runtime error occurs, right? Wrong, the output is:

null

Why?

The reason is that System.out.println has been programmed to write the string "null" if it is called with null as input. The method signature doesn't show this, but it is clearly stated in the Java API documentation: "If the argument is null then the string 'null' is printed.".

What if instead of printing the string returned by intToString, we want to print the string's size (i.e. the number of characters). Let's try it by replacing ...

System.out.println ( intToString ( 1 ) );

... with this:

System.out.println ( intToString ( 1 ).length() );

Now the program doesn't continue silently. A NullPointerException is thrown again, because the program tries to execute length() on a null object.

As we can see from this simple example, the result of misusing null is inconsistent.

In the real world, the final outcome of incorrect null handling ranges from totally harmless to totally harmful, and is often unpredictable. This is a general, and frustrating property of all programming languages that support null, but don't provide compile-time-null-safety. Imagine a big application with thousands of functions, most of them much more complex than our simple toy code. None of these functions are implicitly protected against misuses of null. It is understandable why null and the "billion dollar mistake" have become synonyms for many software developers.

We can of course try to improve the Java code and make it a bit more robust. For example, we could explicitly check for a null input in method intToString and throw an IllegalArgumentException. We could also add a NonNull annotation that can be used by some static code analyzers or super-sophisticated IDEs. But all these improvements require manual work, might depend on additional tools and libraries, and don't lead to a satisfactory and reliable solution. Therefore, we will not discuss them. We are not interested in mitigating the problem of the null pointer error, we want to eliminate it. Completely!

Null Allowed

Let's slightly change the specification of function int_to_string. We want it to accept null as input and return:

"one" if the input is 1
"not one" if the input is not 1 and not null
null if the input is null

How does this affect the code in the three languages?

Java

This is the new code written in Java:

static String intToString ( Integer i ) {
    if ( i == null ) {
        return null;
    } else {
        return i == 1 ? "one" : "not one";
    }
}

We could again use the ternary operator and write more succinct code:

static String intToString ( Integer i ) {
    return i == null ? null : i == 1 ? "one" : "not one";
}

Whether to chose the first or second version is a matter of debate. As a general rule, we should value readability more than terseness of code. So, let's stick with version 1.

The crucial point here is that the function's signature has not changed, although the function's specification is now different. Whether the function accepts and returns null or not, the signature is the same:

String intToString ( Integer i ) {

This doesn't come as a surprise. As we saw already in the previous example, Java (and other languages without null-safety) doesn't make a difference between nullable and non-nullable types. All types are always nullable. Hence by just looking at a function signature we don't know if the function accepts null as input, and we don't know if it might return null. The best we can do is to document nullability for each input/output argument. But there is no compile-time protection against misuses.

To check if it works, we can write a simplistic test application:

public class NullAllowedTest {

    static String intToString ( Integer i ) {
        if ( i == null ) {
            return null;
        } else {
            return i == 1 ? "one" : "not one";
        }
    }

    static void displayResult ( String s ) {
        String result = s == null ? "null" : s;
        System.out.println ( "Result: " + result );
    }

    public static void main ( String[] args ) {
        displayResult ( intToString ( 1 ) );
        displayResult ( intToString ( 2 ) );
        displayResult ( intToString ( null ) );
    }
}

Output:

Result: one
Result: not one
Result: null

Haskell

This is the code in Haskell:

intToString :: Maybe Integer -> Maybe String
intToString i = case i of
    Just 1  -> Just "one"
    Nothing -> Nothing
    _       -> Just "not one"

Key points:

Haskell doesn't support null. It uses the Maybe monad.

The Maybe type is defined as follows:
```
data Maybe a = Just a | Nothing
    deriving (Eq, Ord)
```
The Haskell doc states: "The Maybe type encapsulates an optional value. A value of type Maybe a either contains a value of type a (represented as Just a), or it is empty (represented as Nothing). The Maybe type is also a monad."

Note: More information can be found here and here. Or you can read about the Option type in F#.
The function signature clearly states that calling the function with no integer (i.e. the value Nothing in Haskell) is allowed, and the function might or might not return a string.
For string values the syntax Just "string" is used to denote a string, and Nothing is used to denote 'the absence of a value'. Analogously, the syntax Just 1 and Nothing is used for integers.
Haskell uses pattern matching to check for 'the absence of a value' (e.g. Nothing ->). The symbol _ is used to denote 'any other case'. Note that the _ case includes the Nothing case. Hence if we forget the explicit check for Nothing there will be no compiler error, and "not one" will be returned if the function is called with Nothing as input.

Here is a simple test application:

import Data.Maybe (fromMaybe)

intToString :: Maybe Integer -> Maybe String
intToString i = case i of
    Just 1  -> Just "one"
    Nothing -> Nothing
    _       -> Just "not one"

displayResult :: Maybe String -> IO()
displayResult s = 
    putStrLn $ "Result: " ++ fromMaybe "null" s

main :: IO ()
main = do
    displayResult $ intToString (Just 1)
    displayResult $ intToString (Just 2)
    displayResult $ intToString (Nothing)

Output:

Result: one
Result: not one
Result: null

Note the fromMaybe "null" s expression in the above code. In Haskell this is a way to provide a default value in case of Nothing. It's conceptually similar to the expression s == null ? "null" : s in Java.

PPL

In PPL the code looks like this:

function int_to_string ( i pos_32 or null ) -> string or null
    case value of i
        when null
            return null
        when 1
            return "one"
        otherwise
            return "not one"
    .
.

Note: A case expression will be available in a future version of PPL (besides the case statement shown above). Then the code can be written more concisely as follows:

function int_to_string ( i pos_32 or null ) -> string or null = \
    case value of i
        when null: null
        when 1   : "one"
        otherwise: "not one"

Key points:

In PPL null is a regular type (like string, pos_32, etc.) that has one possible value: null.

It appears as follows in the top of PPL's type hierarchy:
PPL supports union types (also called sum types, or choice types). For example, if a reference can be a string or a number, the type is string or number.

That's why we use the syntax pos_32 or null and string or null to denote nullable types. The type string or null simply means that the value can be any string or null.
The function clearly states that it accepts null as input, and that it might return null.
We use a case instruction to check the input and return an appropriate string. The compiler ensures that each case is covered in the when clauses. It is not possible to accidentally forget to check for null, because (in contrats to Haskell) the otherwise clause doesn't cover the null clause.

A simple test application looks like this:

function int_to_string ( i pos_32 or null ) -> string or null
    case value of i
        when null
            return null
        when 1
            return "one"
        otherwise
            return "not one"
    .
.

function display_result ( s string or null )
    write_line ( """Result: {{s if_null: "null"}}""" )
.

function start
    display_result ( int_to_string ( 1 ) )
    display_result ( int_to_string ( 2 ) )
    display_result ( int_to_string ( null ) )
.

Output:

Result: one
Result: not one
Result: null

Note the """Result: {{s if_null: "null"}}""" expression used in function display_result. We use string interpolation: an expression embedded between a {{ and }} pair. And we use the if_null: operator to provide a string that represents null. Writing s if_null: "null" is similar to s == null ? "null" : s in Java.

If we wanted to print nothing in case of null, we could code """Result: {{? s}}"""

Discussion

Again, the three languages allow us to write code that works correctly.

But there are some notable differences:

In Haskell and PPL, the functions clearly state that 'the absence of a value' is allowed (i.e. Nothing in Haskell, or null in PPL). In Java, there is no way to make a difference between nullable and non-nullable arguments (except via comments or annotations, of course).
In Haskell and PPL, the compiler ensures we don't forget to check for 'the absence of a value'. Executing an operation on a possibly Nothing or null value is not allowed. In Java we are left on our own.

Here is a comparison of the three versions of function int_to_string:

Java

static String intToString ( Integer i ) {
    if ( i == null ) {
        return null;
    } else {
        return i == 1 ? "one" : "not one";
    }
}

Haskell

intToString :: Maybe Integer -> Maybe String
intToString i = case i of
    Just 1 -> Just "one"
    Nothing -> Nothing
    _ -> Just "not one"

PPL

New version (not available yet):

function int_to_string ( i pos_32 or null ) -> string or null = \
    case value of i
        when null: null
        when 1   : "one"
        otherwise: "not one"

Current version:

function int_to_string ( i pos_32 or null ) -> string or null
    case value of i
        when null
            return null
        when 1
            return "one"
        otherwise
            return "not one"
    .
.

And here is the function used to display the result:

Java

static void displayResult ( String s ) {
    String result = s == null ? "null" : s;
    System.out.println ( "Result: " + result );
}

Haskell

import Data.Maybe (fromMaybe)

displayResult :: Maybe String -> IO()
displayResult s = 
    putStrLn $ "Result: " ++ fromMaybe "null" s

        function display_result ( s string or null )
            write_line ( """Result: {{s if_null: "null"}}""" )
        .

That's it for part 1. In part 2 (to be published soon) we'll have a look at some useful null-handling features used frequently in practice.

Header image by dailyprinciples from Pixabay.

DEV Community

Null-Safety vs Maybe/Option - A Thorough Comparison (Part 1/2)

Introduction

Why Should We Care?

Three Approaches

Source Code Examples

Null-Safety

Null Not Allowed

Java

Haskell

PPL

Discussion

Null Allowed

Java

Haskell

PPL

Discussion

Oldest comments (0)