The illusion of statelessness

#architecture #systemarchitecture #state #stateless

Some libraries, frameworks, components, and architectures either encourage statelessness, or make it a requirement. While statelessness has a lot of benefits, it's unfortunately rarely possible in the real world. In this post, I'd like to detail this stance of mine a bit.

State in Functional Programming

Functional Programming is based on a set of principles. Among those principles are pure functions:

A pure function is a function that has the following properties:

Its return value is the same for the same arguments

Its evaluation has no side effects

For example, the following function is pure:

int timesTwo(int value) {
    return value * 2;
}

A pure function allows referential transparency: the swapping of then function with its return value. With that in mind, the introduction of state may defeat the previous definition. For example, the following breaks referential transparency:

class Multiplier {

  private int factor = 1;

  public int times(int value) {
    return value * factor;
  }

  public void setFactor(int factor) {
    this.factor = factor;
  }
}

Successive calls to times() with the same argument may return different results because the setFactor() method may have changed the factor attribute's value between calls. Interestingly enough, a method can conform to the referential transparency property, but still break the no side-effects property. Here's an example:

class MemoizedMultiplier {

  private final int factor;
  private final Map<Integer, Integer> results = new HashMap<>();

  public MemoizedMultiplier(int factor) {
    this.factor = factor;
  }

  public int times(int value) {
    int cache = results.get(value);
    if (cache ## null) {
      int result = value * factor;
      results.put(value, result);
      return result;
    } else {
      return cache;
    }
  }
}

The above implementation is immutable: state is set once when the object is created, and cannot be changed afterwards. Successive calls with the same argument will return the same result over and over. Yet, the method has side-effects.

To keep most of the system pure, practitioners of FP push non-pure functions to the system boundaries.

State in web architectures

The HTTP protocol is stateless. The benefit is obvious: when a request hits the load-balancer, the latter can forward it to any web server that belongs to the cluster, and that hosts the requested resource. This allows for horizontal scaling. When the load increases, it's straightforward to add more nodes until the performance becomes acceptable again. It works like a charm, until the requirement goes beyond just displaying static pages.

In reality, a lot of use-cases require different HTTP requests to be recognized as originating from the same "session": authentication, e-commerce caddies, etc. Cookies are the browsers' answer to those requirements.

Servers do offer a generic storage mechanism, keyed to a session identifier. On the first request, the server adds a cookie with a specific session id to the response. Subsequent requests will use this cookie, so the server associates the request with the same session. Different servers have different cookies: JSESSIONID for Java EE, PHPSESSID for PHP, ASPSESSIONID for ASP, etc.

Data stored on a specific cluster node won't obviously be accessible to requests forwarded to other nodes, even when they originate from the same session. To avoid that, sessions needs to be "pinned down" on the node that received the first request for that session. In standard web architectures, this is one of the responsibility of the load-balancer: it keeps the association between a session id and the node in memory. This feature is known as sticky sessions.

However, nodes will sometimes fail. If session data is on that node, it will be lost. To compensate for that, data needs to be replicated on other nodes. That capability is known as session replication.

Sticky sessions, and even more so sessions replication, make architectures stateful.

State in REST architectures

The REST style of architecture is based on 5 principles:

Client-server architecture
Statelessness
Cacheability
Layered system
Uniform interface

Note that #2 explicitly defines statelessness as a requirement of the REST architecture. However, the definition of it in this context is not the absence of state, but that state shouldn't be stored server-side.

The client-server communication is constrained by no client context being stored on the server between requests.
Each request from any client contains all the information necessary to service the request, and the session state is held in the client.
The session state can be transferred by the server to another service such as a database to maintain a persistent state for a period and allow authentication.

The quote above defines two options to store data:

In a database:

Storing data is the raison d'être for databases, whether SQL or NoSQL. Hence, it seems like an obvious choice to store state. However, this generally implies disk-based persistence. In turn, this means increased access time to data, 2 or 3 orders of magnitude higher than for in-memory access.
On the client:

Another option is to store data as cookies. This approach raises some interesting challenges on its own: the first one is about security. If credentials-related data is stored client-side, then how can the server guarantees they are genuine? The currently agreed-upon answer is JSON Web Tokens. One just needs to remember it makes the flow more complex, and slower, because validation still needs to occur server-side, prior to any further request handling.

The second challenge revolves around data serialization. Simple types e.g. int or String can easily be set in cookie values, but what about more complex types? It would require an serialization/deserialization mechanism: when the request is received, cookies should be deserialized into objects, and when the response is sent, objects should be serialized back into a compatible format e.g. JSON. The complete setup is worth a dedicated post, but here are some points that deserve some attention:
- How to manage serialization limitations?
- The need to automate the serialization/deserialization flow
- Configure the entities to be serialized, and decide which JSON library to use

State in Kubernetes

First, one should remember that Kubernetes was originally focused on stateless workloads. For example, StatefulSet was introduced in version 1.5 as a beta feature. Furthermore, rolling upgrades of the Deployment object handle pods only. If containerized applications use a database, and software require different schema versions, you're on your own: while possible, it's not trivial.

Apart from that, Kubernetes offers some interesting options to manage state:

Inside the container:

The easiest way to store data is to use the container's topmost layer. This is the case when one uses one of the filesystem unmapped directories e.g. /var/log. This option has a big flaw: when the container is stopped, whether abnormally or not, data is lost.
In an attached volume:

Volumes are the nominal way to store data across pod stops. There is a lot whole different kinds of volumes available:
- Volumes can be mounted on the host
- They can be as well on NFS
- There is one kind for each common Cloud provider: Google Cloud, Azure, and AWS
- Empty volumes allow to share data between pods
- etc.
Note that even with volumes, there's no 100% durability guarantee. It depends on the exact kind of volume, and the surrounding context.

Final thoughts

Despite what we would like to believe, state can be avoided only in trivial cases. There are only three available options:

Store state somewhere. There are multiple locations where to put data: serialized in cookies client-side, serialized in a data store, replicated in-memory, etc.
Compute the state every time it's required
A combination of the above

Technical experts need to be aware of the above options, as well as to know which tradeoffs each of them implies. Stop fighting state, it's a waste of time.

Originally published at A Java Geek on June 28th, 2020.