Why do I think Haskell is a good choice in regards to Software Security?

#haskell #security #programming #functional

Author: Ville Tirronen

The Typeable Team appreciates security! We love Haskell, but is Haskell a good choice when secure software is the goal? We would love to say yes, but like most empirical questions about software development, there is simply no hard evidence that Haskell, or any general programming language, is more secure than any other. That is not to say that the Typeable's language choice doesn't matter in regards to security, but how it does may need to be elaborated.

After teaching introductory Software Security for half a decade I can attest that Software Security has no universal theory on which to rely. Security is most often taught by enumerating different security issues, mitigations and security models and hoping that students can build from them to gain general understanding. Even of those theoretical works that exist, relatively few of try to build a link between programming language and security aspects.

In this post, I'll sketch my favourite perspective for linking the choice of programming language to security. This is viewing the different vulnerabilities on the scale between the "domain" and "incidental" vulnerabilities:

   Purely technical               Purely domain specific
    vulnerability                    vulnerability
        ↓                                  ↓
        ┠───────────╂───────────╂──────────┨
             ↑            ↑          ↑
        Tools should  Tools can  You have to
            fix         help       think

The axis above represents the provenance of different software vulnerabilities. On the far right, we have purely domain specific errors, that is, those that are completely independent of the tools used. One example of such domain error is the "security questions" many early 2000s web services had for password recovery. Often the questions were like "what is your mothers maiden name?". Then, around 2009-10 a thing called social media appeared and suddenly everyone's "mothers maiden name" becomes public information. It doesn't matter what technology you use to implement such "security questions" scheme. It is broken regardless.

On the far left of our scale, we have errors that have a purely technical cause. They are completely independent of the problem domain. One good example of such a problem is the notorious buffer overflows. It does not matter at all what you're storing in the buffer -- if the buffer overflows it allows an attacker to mess with the supporting structures of your program at will. Here, you can avoid buffer overflows, at least in theory, by using a toolchain that has no unchecked buffer writes.

Between the far ends of the scale, we have a part where the vulnerability is not entirely technical, but it is neither completely a domain matter either. One stereotypical example of such vulnerability is typically found in services that allow file uploads.

In such services, it is often tempting to write the user-supplied file directly on the server filesystem. However, with which filename? Using the user-supplied filename directly is a recipe for disaster since it could be something like ../../../etc/nginx/nginx.conf, ../../../etc/passwd/ or any number of files the server can touch, but really shouldn't.

This hazard is a mixture of the technical and the domain and while it is unlikely that any toolchain would prevent this "out of the box", it is easy to see how some tools might help to control such problematic behaviour better than others.

Applying the scale

The usefulness of this scale is in appraising your toolings, such as programming language and frameworks. How many of the purely technical issues does your tooling handle all by itself? How far along the scale does it offer you extra leverage against errors that lead to vulnerabilities?

Modern tooling should ideally prevent the purest technical vulnerabilities almost entirely. For example, most modern languages, like Haskell, C# and Java are all mostly memory safe and all of them will largely prevent buffer overflows, double frees and other technical problems. But, good tooling can be leveraged further. For example one can easily imagine a system that has technical means of separating absolute and relative file paths, making it easier to control for path traversal attacks, such as user uploading a file over some critical system configuration file.

Haskell, on the low end of the scale

Haskell, like most modern languages, performs well with low-level, technical vulnerabilities. For one, Haskell is memory safe which takes one huge expanse of potential vulnerabilities out of reach of potential attackers -- arrays and buffer overflows are even more so. Secondly, Haskell is statically dispatched, which also guards against entire families of errors, such as PHP's famous "type juggling":

// From imaginary CSRF token protection:
if ($tokenHash == $hashFromInternet->{'tokenHash'}) {
  echo "200 OK - Request accepted", PHP_EOL;
}
else {
 echo "403 DENIED - Bad CSRF token", PHP_EOL;
};

See the issue above? Most dynamic languages, like PHP, decide the "type" of JSON record during run time and often based on the structure of input data. Also, in object-oriented programming, the "type" is used to select behaviour through dynamic dispatch, effectively allowing the attacker to choose which code is executed. Compoundingly, PHP's equality via == is dependent on the input types and the attacker can bypass the security entirely in the above example.

A similar issue has occurred with Java (and other languages, see https://frohoff.github.io/appseccali-marshalling-pickles/). Java provided a suberbly user-friendly way of serializing any object to disk and recovering it back in its original form. The only unfortunate problem was that there was no way to say which object you are expecting! This allows attackers to send you objects that, upon deserialization in your program, become nasties that wreak havoc and steal data.

This is not to say that you can't have secure code in PHP or that you can't have errors like this in Haskell, but that Haskell is not naturally inclined towards these vulnerabilities. To put the above example into Haskell code, it would read something like this:

data Request = Request {csrfToken :: Token, ... other fields}
doSomething :: Session -> Request -> Handler ()
doSomething session request
  | csrfToken session == csrfToken request = ... do something
  | otherwise = throwM BadCsrfTokenError

Here, type juggling is taken care of by routine practice of giving interface types a concrete, known even before the program is executed, type.

Haskell, middle of the scale

When considering the middle of the "technical" vs. "domain" scale, Haskell has features that make it, in my mind, quite an advantageous choice.

Foremost, Haskell can model data more accurately than languages like C, Javascript or even Java. This is mostly due to its convenient syntax and sum types. Accurate modelling of data is relevant to security since most domain code is a model of some real-world phenomenon, and the less accurate it is, the more play it gives to attackers.

Having accurate modelling tools helps programmers to navigate around domain blunders. For example, consider the simple ability to easily express with one line that, say, a social security number is either unknown, redacted or 'this value here':

data SSN = Unknown | Redacted | SSN Text

Now, contrast this to modelling the same idea using the string values "", "<REDACTED>" and "191091C211A". What happens if the user types "<REDACTED>" in SSN input box? Could it cause an issue later on? With Haskell, you don't need to worry about such.

Similar techniques can help programmers improve security everywhere. To continue the previous example of safely storing user files on a server, if your user upload storing function starts with

storeFileUpload :: Path Abs File -> ByteString -> IO ()
storeFileUpload path = ...

you are much less likely to create a situation where users can overwrite your system files: this code will not compile unless it is practically impossible that the filepath does not contain a path traversal attack. Similarly, if, after a user has failed to log in, the user data simply is not available to the program, or if you simply can not embed unchecked user input to HTML pages, you're less likely to screw up.

I'm not claiming that other languages can't be used to write secure code nor even that Haskell automatically makes your code more secure. Only that Haskell has very convenient tools that you can use to build up your security.

Haskell and domain errors

Earlier, I defined pure domain errors as those errors that are indifferent regarding the tools used. This is not entirely true. People don't choose their tools randomly and communities of similar-minded people often form around different tools. And these communities may have a different outlook on security.

The thing that speaks in Haskell's preference here is the fact that you can't get good at Haskell by accident. Haskell is presently enough rare technology that not all Universities even teach it and almost no curriculum is completely taught with it. That is, if someone is good at Haskell, it is not an unreasonable guess that they would also have skill at working with formal systems or interest in computer science topics. Though this does not ensure that Haskell programmers know anything about security it does hint that they might be fast on the uptake when it becomes necessary.

But, all this is guesswork. Haskell community has been small enough not to be targetted by attackers and Haskell people, in general, haven't yet been burned by security issues in the same way as Javascript or Python developers.

Conclusions

Haskell isn't certainly without flaws and I'm not claiming that other languages cannot share similar advantages. And in some cases, such as timing and other side-channel attacks, other tools may even offer a better security profile. Also, some language communities are more focused on security than Haskell. But personally, I find that among the current viable selection of general-purpose programming languages, Haskell offers a very good package for writing secure software.