Benoit Ruiz

Posted on May 4, 2022

Data immutability

#functional #programming #tutorial #typescript

Introduction
Characteristics of data immutability
Summary

Introduction

Data immutability is a concept that applies to values that are created once, and cannot be modified afterwards. Values are in read-only mode, frozen in time.

If we want to change a value, we have to create a copy of it, then change this copy. The newly created value becomes immutable in turn, thus carrying this read-only property.

Data immutability comes in direct opposition to data mutability. A mutable value is in read-write mode, i.e. it can be altered by anyone, at any time.

An example of a mutable value could be an instance of a class whose methods change the value of its properties:

class User {
  constructor(private name: string) {}
  setName(newName: string): void { this.name = newName }
  getName(): string { return this.name }
}

const user = new User('Bob')
console.log(user.getName()) // "Bob"
user.setName('Henri')
console.log(user.getName()) // "Henri"

The fact that a property may change in time makes the code less predictable, and harder to understand, test, and debug.

Mutability does not only apply to imperative paradigms such as Object-Oriented Programming. We could use code that looks functional, and still have mutability:

interface User { name: string }

function setName(user: User, newName: string): User {
  user.name = newName
  return user
}

const user: User = { name: 'Bob' }
console.log(user.name) // "Bob"
const newUser = setName('Henri')
console.log(user.name, user === newUser) // "Henri", true

Note that in both cases, we used const to declare the user variable, though we were still able to mutate their name property. In JavaScript, the const keyword ensures that we cannot assign a new value to the variable, but the value inside the variable can be changed, as long as it is not a primitive type such as string, number, or boolean.

Here, user is an object, i.e. a non-primitive type, so we can freely mutate its properties. There are no exceptions thrown by the compiler at compile time, nor by the JavaScript engine at runtime.

There are ways to ensure immutability at both compile and run times for non-primitive values, though we will not discuss these in this article.

In TypeScript/JavaScript specifically, feel free to look for:

Immutability in TS, at compile time, using as const, readonly, and Readonly<A> type syntax.
Immutability in JS, at runtime, using Object.freeze on both arrays and objects.
Immutability in JS, at runtime, using a third-party library such as Immutable.js.
Immutability in JS, at runtime, using immutable records and tuples (more on that in the next paragraph).

In the latest State of JavaScript 2021, one of the most wanted features in JS that people would like to use is Immutable Data Structures such as Record and Tuple.

The JavaScript Records & Tuples Proposal, which is currently in stage 2 out of 4, should allow developers to use deeply immutable object-like and array-like structures, using respectively #{ x: 1, y: 2} and #[1, 2, 3].

This shows that people (or at least JS/TS developers) are really interested in data immutability.

That being said, we do not need immutability enforced by the language, or a library, to actually write code that deals with immutable values.

Data immutability is a matter of not mutating values. Whether these values are technically protected against changes by the compiler/library or not, at the end of the day, it is our responsibility as developers to keep these values unaltered.

Data immutability depends on the developers' discipline to not mutate values. We can be helped by technology to enforce this property, but it is not a prerequisite. Though, I would advise using features enforcing immutability as much as possible, as it can be tempting to take shortcuts and mutate values to go faster.

Let's see what are the advantages and drawbacks of using immutable data in our programs. The list from the next chapter herafter is non-exhaustive; feel free to share your opinion.

Characteristics of data immutability

Code is more predictable

Once some piece of data has been created, it cannot change anymore. We do not have to worry about changes happening behind our backs. We do not have to search the entire codebase to see if it is safe, or not, to use that particular value.

If that value contains the information we need, then we can use it safely. We can let our guard down a little, and relax our defensive programming mindset. Once a value has been verified to contain all the information it should contain, then it is valid indefinitely. There cannot be any surprises, or undesired behavior.

When dealing with mutable data though, we have to be extra cautious. Suddenly, our program is filled with conditions and assertions to make sure we are using a value that has the expected shape.

Furthermore, the type of a value cannot help us understand where it is used in the timeline of events. A piece of data that changes over time must hold a type that works no matter its state. Thus, we end up using types that are quite generic (e.g. with lots of optional properties), and that are not great at helping us understand what is going on in a specific part of the codebase.

Let's take an example. Here is a program representation, where squares are modules, ellipses are mutable values, and arrows are interactions between modules and these values (from value to module = read, from module to value = write):

Can you guess what is the order in which these arrows happen?

We cannot accurately predict what will be the actual data flow of this program. We can make some guesses or assumptions, for example:

A → B → C → D → F → G → I → E → H
B → A → F → D → C → I → G → H → E
A → B → C → F → D → I → E → H → G

If we really want to know the answer, we have to actually read the code, or run the program to find out.

Now, let's make these values immutable. In other words, arrows from modules to values (i.e. write operations) are impossible. The modified program looks like this:

Here, because the data flows in a linear direction, we can actually have a sense of timeline of events happening in the program. With this information, it is much easier to predict the path that will be taken:

In this new illustration: A → B → (C → D → E)
On the original one from above: A → B → C → F → D → I → H → E → G

Additionally, the type of these values can be defined more accurately. For example, in the left-most module, we know that the green value has the shape {a, b, e}. In other words, we know e is defined and we do not have to make assertions later in the program. From this point and onward, the type is {a, b, e}, and not {a, b, e?} like we had in the original program.

Thread-safety

As we already mentioned in a previous article, data immutability allows to program with thread-safety baked in. We do not have to worry about race conditions, since we do not mutate any shared state. Reading from a read-only value is multithreading-friendly.

Threads may use a local mutable state, as long as this state is not accessed by any other thread. The coordinator is in charge of gathering the results from the threads, then create a new, immutable state based on these results.

If we were to implement a program with multiple threads using a shared mutable state, we would have to use complex mechanisms to have the same advantages of using immutable data.

Some examples of these mechanisms could be:

A granular locking mechanism to safely access some parts of the shared state.
These locks should have a timeout mechanism, in case a thread dies unexpectedly, to release the lock and make the resource available again.
Another service to listen to transactions, and keep a history of all the state changes, e.g. for audit or compliance purposes.

Time Travel Debugging

As Microsoft says:

Time Travel Debugging (TTD) can help you debug issues easier by letting you "rewind" your debugger session, instead of having to reproduce the issue until you find the bug.

A lot of actions in the software have consequences on the state of the program. Let's take a basic example: a "to do list" application. This program exposes a list of tasks to do. We can add, modify, or remove tasks to/from this list, and we can also mark some of these tasks as "done".

If we manage to:

Save the initial state, e.g. an empty "to do" list
After each action, save a snapshot (or copy) of the action performed, the state at that time, and the resulting state following the action

Then we can implement Time Travel Debugging quite easily.

This allows us to replay the session, step by step, helping us identify which combination of action and state led to a bug, or if something unexpected happened between 2 actions.

A nice side effect (not to be confused with side effects) is that we can very easily implement undo/redo actions. All we have to do is travel back or forward in time, i.e. restore a previous state.

If you are familiar with frontend development using TypeScript or JavaScript, then you might have heard about Redux. It is a library for state management, often used with React, whose particularity is to use reducers to update the state of the program. A reducer is a pure function that takes an action and the state as arguments, and returns a new state. We can easily plug a middleware to keep track of every reducer call, allowing us to build a Time Travel Debugging tool, such as Redux DevTools.

More memory allocation

As a reminder, if we want to change a value, we have to create a copy of it, then apply the changes on that copy. What happens if we have a huge list of values, and we want to add a new element? Or, what happens if we have an object with a lot of depth, and we want to change the value of a deeply-nested property?

We have to duplicate the entire value before applying the changes, that is the rule. As a consequence, our program has to run on a device that has more memory than it actually needs to perform correctly. (disclaimer: I guess today's engines are smart enough to make optimizations in this area, but I don't have sufficient knowledge to make such a claim. Feel free to share if you know more about it!)

In the majority of cases, the programs we write are used on devices that have a lot of memory. Plus, the engines that run the code have mechanisms such as Garbage Collection, a.k.a GC, to free unused memory up. Unless we need to keep track of previous values (e.g. for Time Travel Debugging, history, auditing...), the previous value that got copied becomes useless, so it can be safely removed from the memory by the GC.

However, there are devices where the memory is not that abundant. This is the case for IoT (Internet of Things), or programs run on a Raspberry Pi, or similar. In these cases, immutability may not even be an option for large values. Furthermore, developers' discipline as we mentioned earlier may not even apply: the limited amount of memory may force us to purposely mutate values, as the memory is scarce.

May be cumbersome to update deeply-nested values

Let's take the following User model:

interface User {
  name: string
  job: Job
}

interface Job {
  title: string
  company: Company
}

interface Company {
  name: string
  address: Address
}

interface Address {
  street: AddressStreet
  zipCode: string
  country: string
}

interface AddressStreet {
  name: string
  nb: number
  special?: string
}

Granted, we could have used a simple string for the company's address, but this is an academic example. Furthermore, people might want to (or are constrained to) use a complex solution to model the address, such as this one.

So, keeping data immutability in mind, how would we update the name of the street?

We could use the spread operator to rebuild the User object, while applying the change(s) we want:

declare const user: User

const userWithNewCompanyAddress: User = {
  ...user,
  job: {
    ...user.job,
    company: {
      ...user.job.company,
      address: {
        ...user.job.company.address,
        street: {
          ...user.job.company.address.street,
          name: 'Awesome avenue'
        }
      }
    }
  }
}

But wait, you said we had to clone/duplicate the value before altering it. You don't duplicate the whole object here?

Indeed. Using the spread operator, we are making shallow copies of every intermediate object. This means that, if user.job.title was an object, then userWithNewCompanyAddress.job.title would be the exact same object (same reference), not a copy of it.

Ok then, let's use a solution that truly clones the whole value:

declare function deepCopy<A>(obj: A): A

declare const user: User

const clonedUser = deepCopy(user)
clonedUser.job.company.address.street.name = 'Awesome avenue'

I must admit, I am not fond of this approach:

We need some deepCopy utilility function to clone objects (and possibly arrays). It is not very hard to implement if we use pure data: something such as JSON.parse(JSON.stringify(obj)) should do the trick, although it has its limitations. Nonetheless, such a function is not available in the standard library.
- [side note] I recently heard about the native structuredClone function to deep-copy an object, though it is only supported on recent browser and Node versions.
It has some runtime performance impact. For a single object, it is probably negligeable. Though, what if we iterated over hundreds or thousands of objects that would be more complex than this one?
We still have a mutation step, even if it applies on a copy of the initial value. It may feel odd to discourage/forbid mutations, then see this type of lines of code in the codebase.

This is why I prefer the first approach:

It is a one-shot step: only one value assignment to a variable
It preserves the original sub-objects and their properties that are not changed: better memory footprint and less CPU utilization (please, correct me if I am wrong here)

However, as you can see, the major drawback is that it is quite verbose if we are changing a deeply-nested value.

In the functional world, there is a solution to that: optics. You might see the word "lens" (or "lenses") come up more often than "optics". A lens is a type of optic that, in my experience, is the most used compared to other optics such as iso, prism, or traversal.

Without going into too many details, an optic is a composable and pure getter/setter.

We might talk about optics in this series later, in a bonus article. For now, here is how we could leverage optics to improve readability in our case, using monocle-ts:

import { Lens } from 'monocle-ts'

declare const user: User

const companyStreetName = Lens.fromPath<User>()([
  'job', 'company', 'address', 'street', 'name'
])

const userWithNewCompanyAddress: User =
  companyStreetName.set('Awesome avenue')(user)

Finally, to demonstrate the power of optics, let's imagine that the company has several addresses, and we would like to change all their street names to lowercase:

const newUser = {
  ...user,
  job: {
    ...user.job,
    company: {
      ...user.job.company,
      addresses: user.job.company.addresses.map(address => ({
        ...address,
        street: {
          ...address.street,
          name: address.street.name.toLowerCase()
        }
      }))
    }
  }
}

When we start mixing objects and arrays, it gets messy quite quickly. Using optics, this would become more readable, and more composable as well:

import { fromTraversable, Lens, Traversal } from 'monocle-ts'
import { Traversable } from 'fp-ts/Array'

// optic to get the name of the street, from an address
const streetNameL = Lens.fromPath<Address>()(['street', 'name'])

// optic to get an address from a list of addresses
const companyAddressesT: Traversal<Address[], Address> =
  fromTraversable(Traversable)<Address>()

// optic to get the names of the street, from a list of addresses
const companyStreetNamesT: Traversal<Address[], string> =
  companyAddressesT.composeLens(streetNameL)

// optic to get the names of the street of the company, from a user
const userCompanyStreetNamesT: Traversal<User, string> =
  Lens.fromPath<User>()(
    ['job', 'company', 'addresses']
  ).composeTraversal(companyStreetNamesT)

const lowerCaseCompanyStreets: (u: User) => User =
  userCompanyStreetNamesT.modify(name => name.toLowerCase())

const newUser = lowerCaseCompanyStreets(user)

The most interesting part (and more declarative as well IMO) being:

userCompanyStreetNamesT.modify(name => name.toLowerCase())

Immutability syntax may bloat the code

The standard library of some languages exposes mutable data structures by default. This is the case in TypeScript, with arrays and objects. This means that, if we want to enforce immutability, we have to use additional syntax, or use data structures imported from a third-party library.

In TypeScript, adding keywords such as readonly, as const, and Readonly<> everywhere (on top of existing types) can lead to code that gets more difficult to read and understand.

Which one of the following is easier to read?

const actions = ['a', 'b', 'c']

type Action = 'a' | 'b' | 'c'

interface User {
  name: string
  actions: Action[]
}

function makePairs<A, B>(arr1: A[], arr2: B[]): [A, B][] {
  if (arr1.length !== arr2.length) {
    return []
  }
  return arr1.reduce(
    (acc, val, index) => [...acc, [val, arr2[index]]],
    [] as [A, B][]
  )
}

const user1: User = { name: 'Bob', actions: ['a', 'a', 'c'] }
const user2: User = { name: 'Henri', actions: ['b', 'a'] }
const arr1: User[] = [user1]
const arr2: User[] = [user2]

const res = makePairs(arr1, arr2)
// const res: [User, User][]

const actions = ['a', 'b', 'c'] as const

type Action = (typeof actions)[number]

interface User extends Readonly<{
  name: string
  actions: readonly Action[]
}> {}

function makePairs<A extends Readonly<any>, B extends Readonly<any>>(
  arr1: readonly A[],
  arr2: readonly B[]
): ReadonlyArray<readonly [A, B]> {
  if (arr1.length !== arr2.length) {
    return []
  }
  return arr1.reduce(
    (acc, val, index) => [...acc, [val, arr2[index]]],
    [] as ReadonlyArray<readonly [A, B]>
  )
}

const user1: User = { name: 'Bob', actions: ['a', 'a', 'c'] }
const user2: User = { name: 'Henri', actions: ['b', 'a'] }
const arr1: readonly User[] = [user1]
const arr2: readonly User[] = [user2]

const res = makePairs(arr1, arr2)
// const res: readonly (readonly [User, User])[]

I think you will agree with me that the first version is more readable, though less safe. It has 30% fewer characters than the version with immutable types. Again, you might want to rely on developers' discipline and not on the language's syntax to make the code less bloated.

Keep in mind that, in a more complex codebase, it could be difficult to see that values (such as arr1, arr2 or the User objects they contain) could be mutated anywhere, leading to undesired side effects. Using TypeScript syntax or a third-party library could prevent these kinds of effects to occur. As always in our jobs, it is a matter of tradeoff between safety and readability.

Maybe some day, TypeScript will release a new compiler option "readonlyByDefault", and a new type operator mutable, that would allow us to use immutable data by default (though the migration of the codebase to this "mode" would be probably painful!).

Summary

Data immutability is great for many reasons as we have seen in this article. It has some drawbacks, but thankfully they can be mitigated, or they do not apply in the majority of cases.

For me, the most important part is the predictability it offers. I think it's great to be able to read a function and be certain that the values it uses cannot be changed anywhere else (e.g. because of some arbitrary event I don't know about).

If I want to know how the values are used, I can do the following:

If it's not returned by the function, then it means that:
- Either the value (e.g. of type Foo) is only used by the function I am currently reading => local scope, I can just focus on this particular function and not worry about the rest,
- Or it is a global immutable state that will never change, and always have the same type (e.g. Foo). So I know exactly what the function is able to do with it, or if it needs more information to work properly.
If it's returned by the function, then I can search for the places where the function is called, and follow the paths from there to see how the data flows in the program.

Plus, it removes a big chunk of lines induced by defensive programming, so the code feels more readable and focuses on the most important parts.

In my opinion, following the "breadcrumbs" in a linear way is great for understanding the codebase, and makes debugging the code easier.

If we zoom out from the code, we can see that one of the most trending pieces of technology of today, which may revolutionize the World in the near future, uses data immutability: the Blockchain.

Additionally, anything that needs some traceability, such as financial operations or database/service accesses, has to implement some kind of ledger or auditing mechanism to better understand (and justify) when something goes wrong. This is only possible with data immutability.

Finally, I think the following quote from Archis Gore written on Quora sums up pretty well how to approach this subject in our day-to-day work:

Shared state is fine if it is immutable. Mutable state is fine if it is not shared.

Thank you for reading this far! In the next article, we will talk about currying, partial application, and tacit programming (also called point-free style). See you next time!

Side note:
Originally, I wanted to include a "how to deal with mutability" chapter where I would take some typical examples (e.g. global mutable state, object instance whose properties are partially defined) and try to make them immutable. Though, I didn't anticipate that I would write so much in the characteristics chapter! So, I decided not to write another chapter here. Let me know if you would be interested though, and I might write another article specifically for this! :)

Photo by Xavi Cabrera on Unsplash.

Pictures made with Excalidraw.