Understanding animated graphs in D3.js

#javascript #d3js #visualization #svg

D3.js is a great and in my opinion very elegant library for building visualizations in a web
browser. Unfortunately some concepts are not that easy to understand from its documentation, especially when it comes to
animathing graphs. I made that experience myself and when I tried to teach these concepts to my students using the
official resources.

I am sure that ObservableHQ is a great way to tinker and learn more about
certain features, but its nature of being inspired by a spreadsheet that automatically updates makes it hard to
understand how these features have to be combined in a real world application.

Therefore I am going to try to explain my (and maybe also other's) troubles when learning D3.js by using a quite simple
example. This won't cover many different features of D3.js, but it will explain the parts I feel are the hardest to
understand. This guide adds some pieces to the introduction available on the D3.js homepage that
would have made learning D3.js easier for me.

The below examples assume that the d3 variable and a <svg> tag is in scope. This can be done by e.g. using a script
tag as described in the D3.js documentation.

<svg />
<script src="https://d3js.org/d3.v5.js"></script>

Building a simple graph

Basically D3.js allows you to generate different markup (e.g. HTML or SVG) based on data. Let's have a look at an
example, which will visualize the numbers of an array as differently sized circles.

const data = [10, 30, 20];

d3.select('svg').selectAll('circle')
    .data(data)
        .join(
            (enter) => enter
                .append('circle')
                    .style('fill', 'red')
                    .attr('r', (d) => d)
                    .attr('cx', (d, i) => (i + 1) * 50)
                    .attr('cy', 50)
        )

The data variables holds the data we want to visualize. In order to do so, we need the
d3.select function, which returns a
D3.js selection. Such a selection acts as a wrapper for the DOM, which focuses on
mass manipulation using data. The d3.select function returns a selection with the first element matching its passed
CSS selector, therefore the only <svg> tag in our HTML document. D3.js makes use of a
fluent interface enabling us to use the just returned
selection and make another function call selectAll. This will return
all cirlce elements being a descendant of the
first <svg> tag in our document. This feels a bit weird, because this results in an empty selection, but we'll get to
that in a second.

So using the d3.select('svg').selectAll('circle') call, we have an empty selection, because no circle exists within
the <svg> tag yet. Now we can assign our data variable using the
data function. This will give us a data selection knowing which
elements we have to add to our visualization. And this is where it starts to get interesting.

Such a data selection offers a join method, taking up to three
arguments. In the above example only one of the arguments has been used, being a function taking the enter selection.
This selection holds all the values from our data set (10, 30 and 20) currently not being tied to a SVG circle
element. Since no elements exist yet, it's our job to create them. This is what the
append method does. We called it with the 'circle' argument,
which means there will be three circle elements once the call is finished, one for every missing data point.

Afterwards the style and
attr calls are used on these circles, in order to style them in
some way. Checkout the circle documenation on MDN
in order to learn about the available properties. These functions are not only taking simple values; they also accept
functions as arguments. If a function gets passed, this function will be called with the value (the d variable) and
the index (the i variable) of the current data point. This way we can set the radius r to the passed value, and
move the circles to the right using the index to calculate the cx attribute.

With these few lines of code we have already created three red circles with a radius of 10, 30 and 20 pixels.

Updating a current selection

Until now we only have a static representation of three values as circles. That's not really interesting so far; we
could have simply written a few lines of SVG ourselves to achieve the same result. The popularity of D3.js comes from
its more advanced features. Let's see how we can change the data when a new data set arrives. At the beginning I was
wondering how this works, since it was not evident from the documentation. Basically the trick is to call the exact
same code again with a different data set. D3.js will then find out which elements have already existed, which ones
have to be added, and if any of the elements have to be removed. The only thing we have to do (besides calling that
code once more) is to tell D3.js how it should handle these different sets of elements.

We do that by passing more function parameters to the join method. The third parameter handles the elements that are
about to be removed, and the second one handles the elements that have already existed before based on the selectAll
call, which explains why it has been there in the above code from the beginning. We will put the entire D3.js code from
above into a separate function, which will be called again when new data arrives. So the selectAll will only return an
empty set on its very first call, but the consecutive calls of it will return the old circle elements.

The following code shows that concept and assume that you have a <button> tag somewhere in your HTML, that will load
the next data set when it is clicked (no out-of-bounds check for brevity):

const data = [
    [10, 30, 20, 40],
    [20, 20, 20],
    [40, 10, 20, 10, 20],
    [20, 10, 20],
];

let currentIndex = 0;

document.getElementsByTagName('button')[0].addEventListener(
    'click',
    () => update(++currentIndex)
);

function update(index) {
    d3.select('svg').selectAll('circle')
        .data(data[index])
            .join(
                (enter) => enter.append('circle'),
                (update) => update,
                (exit) => exit.remove(),
            )
                .style('fill', 'red')
                .attr('r', (d) => d)
                .attr('cx', (d, i) => (i + 1) * 50)
                .attr('cy', 50);
}

update(0);

The data variable now contains an array of arrays, whereby the inner array is interpreted as a single data set. We
have a index variable that starts with 0, and each click on the first button of the document will call the update
function and increase the value of the index. The update function contains the D3.js specific code, but also uses the
index parameter to access the current data set. As explained before the join method is used with 3 callbacks:

The enter callback is called for data points not being currently represented in the visualization. In here the append method is used to add a circle SVG element for each new data point.
The update callback is called for data points that were already represented in the visualization. At the moment we don't do anything special with these elements, so we are just returning that set.
The exit callback is called for elements in the visualization, that do not have a data point in the new data set anymore. This example uses the remove method to delete these elements immediately.

In case you are familar with React, you can think of D3.js doing a similar job: You just tell
it what should happen on each case, and D3.js figures out for you when to call which function.

Since except for appending new circles for the enter set we often want to handle the enter and update set in the
same way, the join method will merge these two sets and return the combination of both. So the return value of the
join method can be used to set styles and other attributes to both of these sets (we could also have added the
subsequent method calls to directly in the enter and update callback, which is usually done when the handling
between those sets differs). That part of the code has just been copied from the first example.

Add transitions for smoother animations

The previous example changed the visualization on every button click, but it still does not really feel like a
sophisticated visualization. D3.js also comes with support for transitions, which will make the animations a lot
smoother. The most important method to achieve this is called
transition, and returns an object similar to a D3.js
selection. After calling transition you can use the attr and style methods as described previously, but instead
of being immediately applied, these changes will be animated. In order to control the speed of the animation, this
selection-like object also contains the duration method.
For delaying the animation the delay method is available.
Both of these functions take a parameter being interpreted as milliseconds.

There is a very important side note: A transition object, despite its similarity, is not a D3.js selection. If a
transition instead of a selection is passed, D3.js might throw errors. For this reason the
call method exists. The call method will execute the passed
function, but instead of returning the value the passed function returns, it will always return the selection on which
the call method was executed. This way the method chain can be broken into pieces, without creating separate
variables bloating our code. See the following example, which just shows a new body for the update function above:

d3.select('svg').selectAll('circle')
    .data(data[index])
        .join(
            (enter) => enter.append('circle')
                .attr('r', 0)
                .attr('cy', 50),
            (update) => update,
            (exit) => exit.call((exit) => exit
                .transition()
                    .duration(500)
                        .attr('r', 0)
                        .remove()
            )
        )
            .transition()
                .duration(500)
                    .attr('r', (d) => d)
                    .attr('cx', (d, i) => (i + 1) * 50)

This is almost the same code as above, but three crucial changes were made in order to animate the transitions between
the data sets:

The transition method is called after the join method. This way we tell the merge of the enter and update set of the join method that the subsequent attr calls should be animated, with a duration of 500 ms.
The enter set of the join method gets a r and cy value immediately. These values are the start values before the animation. Setting the cy immediately and leave it untouched will cause the circles to move only on the x-axis.
The exit set of the join method uses also the transition and duration calls. The attr method is called again after the transition method, which will make the cirlce shrink until it is not visible anymore. The remove of the transition will remove the element after the animation has finished (in contrast to the remove method of the standard selection, which would remove the element immediately). Although not necessary for the exit set (because it will not be further manipulated), the call method has been used, in order to return a proper selection and not a transition from the third callback.

Identify data points by using keys

This already makes a pretty slick visualization! But another important piece is missing: The circles currently do not
have any identity. That means if more numbers than previously have been passed the difference is being passed to the
enter set, and if less number are passed the difference is passed to the exit set. But in many cases this is not
enough. Imagine the visualization of an election: There might be two new parties and one party doesn't exist anymore,
so there should be some elements in the enter and in the exit set. In order to make this possible, we have to be
able to identify the data somehow. And this is why the key function in the data method has been introduced.

We are going to use the above code to show the result of different elections. I know that a bar diagram would be the
better approach to this visualization, but we are going to use the circles again, so please bear with me. In order to
make the circle identifiable by D3.js, and also allow it to move the existing circles instead of just making them
disappear and appear again, we are going to make use of the key function passed as second parameter to the data
method. We are also not using plain numbers as data anymore, but objects consisting of a fill and a value property.
The value replaces the previous number, and the fill property describes the color of the circle and acts as the
identifier of the circle. Let's have a look at the code:

const data = [
    [
        {fill: 'green', value: 30},
        {fill: 'red', value: 18},
        {fill: 'blue', value: 9},
    ],
    [
        {fill: 'black', value: 35},
        {fill: 'red', value: 27},
        {fill: 'green', value: 18},
        {fill: 'pink', value: 3},
    ],
    [
        {fill: 'red', value: 30},
        {fill: 'green', value: 15},
        {fill: 'black', value: 14},
        {fill: 'pink', value: 6},
    ],
];

let currentIndex = 0;

document.getElementsByTagName('button')[0].addEventListener(
    'click',
    () => update(++currentIndex)
);

function update(index) {
    d3.select('svg').selectAll('circle')
        .data(data[index], (d) => d.fill)
            .join(
                (enter) => enter.append('circle')
                    .style('fill', (d) => d.fill)
                    .attr('r', 0)
                    .attr('cy', 50),
                (update) => update,
                (exit) => exit.call((exit) => exit
                    .transition()
                        .duration(500)
                            .attr('r', 0)
                            .remove()
                )
            )
                .transition()
                    .duration(500)
                        .attr('r', (d) => d.value)
                        .attr('cx', (d, i) => (i + 1) * 50)
}

update(0);

It again looks very similar to what we've had before, except for a few minor differences I want to highlight:

The data variable has changed as described previously. It is an array of array, whereby the inner array contains objects with a fill color and a value. The result of an election might be represented in a similar way.
The data function gets a second argument passed, which is called the key function. We return the fill value of the data variable, which will be used as the identifier of the circle.
In the transition after the join call the attr call for r has to be adapted, because the data point is an object instead of a number now.

Now D3.js is able to correctly assign the circles to the enter, update and exit sets. When clicking through the
different data points, the circles should now be keeping their color and move around. If this is not done correctly,
the circles will change their color instead of moving the correct location. That might easily defeat the purpose of
such a visualization, because it what happened to the data is not clear to the viewer this way.

Conclusion

Hopefully the above points help others to understand D3.js faster than I did. Finally I want to say that it might be
easier in many cases to use a simpler charting library, but these libraries are not that powerful, and you might hit
dead ends quite soon. Once you have understood D3.js, you are also pretty fast at developing simple bar and line
chart, while keeping the possibility to turn your chart into something much more sophisticated if required. And as an
additional bonus: Visualizing data is fun!