People often feel at a loss when they first look at languages with immutability. I know I did. How can you write a program without changing the value of a variable?
Mutable
First of all, what does it mean to change a variable? Consider some simple JavaScript:
var x = 5
x = x + 3
This creates a variable called x
that contains the number 5. The second line changes the value of x to itself plus 3, which means that it gets changed to the number 8.
This works, though the choice of the "equals" sign in languages like JavaScript is a bit awkward.
If you were in high school algebra class and you saw:
x = x + 3
…you'd probably throw your hands up in despair. When looking at math, we expect x
to be described by a set of conditions, where =
can be used as part of that description. This statement can't be true, since a number x
can't be equal to itself plus 3. This is why some languages won't use a simple =
character when changing a variable. For instance, Pascal uses :=
, R uses an arrow <-
, and many computer science texts use an arrow character ←
.
Math provides a powerful way to look at the world, and it includes this idea of values that meet a condition, and don't change. Conversely, anyone with significant experience debugging a program has seen variables change unexpectedly. When the state of your application is based on the values in its variables, then this means that understanding the current state can get very difficult when those variables change. Perhaps the idea of values that don't change could be useful.
Immutable
How does this compare to using immutable values?
JavaScript lets you create immutable values with the const
declaration. This means that we can't change the value:
const x = 5
x = x + 3
Which results in:
Uncaught TypeError: Assignment to constant variable.
Instead, we have to declare a whole new thing to take the new value:
const x = 5
const y = x + 3
This is limited, but it's more flexible than you might expect.
Shadowing
You can even re-use the name x
in other parts of the code, without changing the original value. This is called shadowing. In JavaScript this must be done in a new scope:
const x = 5
const y = x + 3
{ // a new scope
const x = y + 1
console.log("the new value is: " + x)
}
console.log("the old value is still: " + x)
the new value is: 9
the old value is still: 5
Inside that new scope, the value of x
is set to 9, but in the outer scope it remains at 5.
But we never print y
so why not just skip that and add x
to itself? Well, it turns out that JavaScript thinks that all references to x
in the inner scope are referring to the x
that's already in that scope, which means that the declaration of x
can't refer to itself:
const x = 5
{
const x = x + 3 + 1
console.log("the value is: " + x)
}
Uncaught ReferenceError: "x" is not defined
JavaScript can use immutable variables, but it isn't entirely smooth.
Clojure
In contrast, languages like Haskell, Erlang, Scala and Clojure are designed to make using immutability natural. They often have some way to allow mutability where it can help, but these are typically awkward to use, or in the case of Haskell, simply not there.
In Clojure we don't have "variables" anymore. Instead, the thing that holds a value is called a "var". These can be declared globally with def
. There are operations that can modify a var, but they should generally be avoided. Those operations can be used to patch or temporarily modify code that is not in your control, but that sort of thing is typically only necessary in libraries or testing.
In a local scope, we declare values via let
. These are just local names for things and cannot be varied at all.
Like all Lisps, operations like let
are enclosed in parentheses. The values to be created appear first, inside an array, with the name immediately before the value it will be set to:
(let [x 5]
(println "value is:" x))
value is: 5
Shadowing is even easier than JavaScript, since the old value of x
will be used until the new value is finished being defined:
(let [x 5]
(let [x (+ x 3 1)]
(println "The new value is:" x))
(println "This old value is still:" x))
The new value is: 9
This old value is still: 5
But we don't necessarily need that old, outer value. When that happens, we don't need the separate scopes, and we can let
the x
multiple times:
(let [x 5
x (+ x 3)
x (+ x 1)]
(println "value:" x))
value: 9
This was made using 3 different values for x
, with each one shadowing the previous one. It's not the way I like to write code myself (mostly because I like to have access to the various values of x, and not have them hidden when they get shadowed), but this demonstrates let
makes shadowing easy.
Loops
Just like shadowing, some loop constructs allow you to reuse a name with a new value each time. For instance, we can vary x from 0 to 4 and get the squares:
(for [x (range 5)]
(* x x))
(0 1 4 9 16)
Each time going through this for
construct, the value of x
will be set to something new. But unlike JavaScript, it has not changed a variable called x
. It's an entirely new x
each time.
In a similar way, we can use a loop
:
(loop [x 0]
(when (< x 5)
(print (* x x)) ;; output a single number
(recur (+ x 1))))
(println) ;; ends the line of output with a newline
0 1 4 9 16
The first time through the loop
, the initial vector acts like a let
where x
has been set to 0
. Then, each time recur
is called, the loop acts like a let
again, only instead of 0
the x
will be set to whatever value the recur
was given.
Structures
Immutability takes on a new meaning when applied to a structure. For instance, consider an array in JavaScript:
const a = [1, 2, 3]
console.log(a)
a.push(4)
console.log(a)
[ 1, 2, 3 ]
[ 1, 2, 3, 4 ]
How did the array a
change when it was declared const
? It happened because a
continued to point to the same array, but the push
operation changed what was in that array. This is like referring to a closet, but hanging up more clothes in there. The closet doesn't change, but the contents of the closet do.
However, languages like Clojure and Scala also offer immutable structures:
(let [a [1 2 3]
a (conj a 4)]
(println a))
[1 2 3 4]
Well, we can see that we shadowed a
, but how do we know if changed the original or not? Let's not shadow it, and print the original object after the extra number was added:
(let [a1 [1 2 3]
a2 (conj a1 4)]
(println "original array:" a1)
(println "new array:" a2))
original array: [1 2 3]
new array: [1 2 3 4]
This works with all of Clojure's structured types. For instance, maps have new items associated with it, via the assoc
function:
(let [m1 {:one 1
:two 2
:three 3}
m2 (assoc m1 :four 4
:five 5)]
(println "original map:" m1)
(println "new map:" m2))
original map: {:one 1, :two 2, :three 3}
new map: {:one 1, :two 2, :three 3, :four 4, :five 5}
The most important aspect of this behavior is that operations which modify a structure will return the new structure, since the original structure does not change. This is different to structures that mutate. Consider the array in JavaScript:
const a = [1, 2, 3]
console.log(a.push(4))
4
This push
operation doesn't return the new array. Instead, it returns the value that was added to the array. That's OK, because we can go back to the object referred to by a
and see that the array has this change. But for immutable structures, the only time you can get access to the new structure after an operation is in the return value of that operation.
Immutable Values with Immutable Structures
Using some of the above examples we can build a structure that is referred to with immutable values. For instance, a loop can be used to build a vector containing squares:
(loop [v []
i 0]
(if (> i 10)
v ;; i is greater than 10, so return v
(recur (conj v (* i i)) (+ i 1))))
[0 1 4 9 16 25 36 49 64 81 100]
While this works, it is also a bit clunky. Typically, when adding lots of things to a single object, the correct way to do it in Clojure is to use the reduce
function. This needs a function that accepts your structure plus another argument, and is expected to return the new structure. For instance, the above example would use a function that takes a vector and a number, then returns a new vector that has the square of the number added. Like:
(defn the-function
[v n]
(conj v (* n n)))
We can test it out by giving it a small vector and a number, and see if the square of that number is added to the end:
(the-function [0 1] 2)
[0 1 4]
The reduce
function can use this for each number in the range 0 to 10 (up to, but excluding, 11):
(reduce
the-function
[] ;; the starting structure
(range 11)) ;; the numbers to process, from 0 to 10
[0 1 4 9 16 25 36 49 64 81 100]
This does essentially the same as the original loop above, but now the operations are more structured.
NB: This code is to demonstrate a point. Building this vector in real code would be better done with a mapv
over the range
.
Size
An intuition that many people develop around this is that the structures are being copied, and then the copy gets modified. While this would work, a copy has the disadvantages of doubling the space consumption of the original, and requires all of the data to be processed, which could be slow when a structure gets large. Creating a vector of 1 million items by adding a single item at a time would create 1 million intermediate vectors, with a total of 500,000,500,000 elements between them.
Instead, Clojure structures use "structural sharing". This means that a new structure can be a reference to the data in the previous structure, and only contain whatever changes it requires. This saves on both space and processing time. There are even internal operations in Clojure that can skip many intermediate steps when processing large blocks of data, thereby making the operation more efficient.
This approach wouldn't work if structures were mutable. For instance, consider a map m
containing 10 items and then map n
is created by adding a new item to m
. Internally, n
has a reference to m
and its single new item, giving it a total of 11 items. The diagram below shows this, and also shows the effective structure that n
represents.
If we were in a mutable system and we took away something from m
, then m
would be down to 9 items. n
still contains its reference to m
plus its extra item, so now n
has also been modified, and ends up with only 10 items in total. In this example, we can see that the number 9
is removed from the end of m
, so this is also removed from n
.
Anyone debugging such a system may be confused as to why n
was changed, because nothing appeared to change n
! But because Clojure's structures don't change, then structural sharing works cleanly.
If you really, really want to know how structural sharing works, I gave a talk on this at :clojureD 2021.
Wrap Up
This was an initial attempt to describe how programming is still possible when variables and data can't be changed, and to show how this can be done in Clojure.
Clojure uses Vars and values that don't change, rather than Variables that do. New values with existing names can be used, and these will hide (or shadow) the previous value (or even a var) with that name. Often times, a piece of code will be run many times with its value names set to different values, such as in a loop, but at no point will an existing var or value be changed inside that block.
Clojure data structures are also immutable, in that their contents never change. The effect of a change is done by creating a new structure with the desired change, and the old structure is often abandoned (though it can also be useful to keep the old structures on occasion, especially for debugging).
Changed structures look like a copy of their predecessor, with a modification. However, this is handled much more efficiently than a copy would imply.
Afterword
Immutability is something that stops people when they first come to functional languages, so I would like to work on explaining it for people making the shift. Personally, I learned about it when I first picked up the Scala language, so none of this is unique to Clojure.
This is my first attempt at explaining this topic, so I'm expecting that I haven't done a great job of it. I would appreciate feedback on the parts that work and the parts that are confusing. Meanwhile, if I left you feeling bewildered, try reading the same topic in "Programming in Scala" by Odersky, Spoon, Venners and Sommers.
Top comments (3)
I thought "var" in Clojure was just a shorthand for the word "variable" as "def" is for "definition".
It kinda is... except that they don't vary 🙂
Yes, I know about
alter-var-root
. But this is more a case of Clojure trying to be extremely flexible for the rare case where it may be useful, rather than providing a feature that should be used regularly.Nice article