loading...

A Balanced Tree

quoll profile image Paula Gearon Updated on ・8 min read

Part 9 arrives after many months, following many personal events, and now a worldwide pandemic. You'll find part 8 back here, or if you've forgotten everything as I had, you can go back to the beginning.

Georgy and Evgenii

The binary trees discussed in previous posts have a major problem that makes them unsuitable for general use, and that is that they can be unbalanced. This means that some branches of the tree may be very deep, while others may be very shallow. In the worst case, such as when sorted data is inserted, a tree can even become a linked list.
Unbalanced tree

There are various approaches to balancing a tree, but the one I want to focus on is the AVL Tree. The name comes from the designers, whose names were Georgy Adelson-Velsky and Evgenii Landis.

AVL trees are a binary tree like the ones already discussed, with the additional property of being balanced. This means that if any branch starts to get much longer than the other branches, then the nodes will get rearranged, or "rotated", in such a way as to rebalance the tree. Of particular significance is that the number of write operations for a rotation during insertion is always a fixed, constant amount. This differs significantly from other tree structures which can result in occasional pauses when large rebalancing write operations occur.

This isn't to say that writing is an O(1) operation. All sorted trees require an O(log(n)) search to be performed so the location for inserting can be identified, and nodes above an insertion need to have their balance updated until the rotation point is found. But once a tree has been accessed, most of the top levels will be cached in memory by the Operating System, making the tree fast to read, and rotations almost always happen near the leaves, making balance updates small as well.

Rebalancing

Back in part 7, I presented the code for creating a binary tree. I also introduced a function called height which recursively found the height at a node, and a function called balance which returned the difference in heights.

A distinguishing property of an AVL tree is that the absolute value of the balance of a node will never get larger than 1. If it ever gets as high as 2 (or as low as -2) then the node must be rebalanced. Rebalancing is a little involved, and I will get to that soon, but for now just be aware that it involved moving 2 or 3 nodes in the tree, and nothing else.

If a node is rebalanced, then no further rebalancing is required throughout the tree. This means that the balance of any parents of a rebalanced node will not change after rebalancing. This not only lets the work stop at this point, but it also means that nodes can remember their balance, and not need to worry about the heights of their branches that gave them that balance.

To accomplish this, I want to add a new balance value to each node. This updates the Node definition from part 7:

class Node {
  constructor(value) {
    this.value = value;
    this.balance = 0;
  }

  setLeft(left) {
    this.left = left;
    this.balance = this.balance - 1;
    return this;
  }

  setRight(right) {
    this.right = right;
    this.balance = this.balance + 1;
    return this;
  }
}

This initializes a node's balance to 0, and when a child is added to the node, then the balance is shifted in the direction of the child (-1 for left, +1 for right).

Now we want to update the function for inserting a node into a tree to update this balance:

function insertNode(node, newNode) {
  if (newNode.value <= node.value) {
    if (node.left === undefined) {
      return node.setLeft(newNode);
    } else {
      let childBalance = node.left.balance;
      node.left = insertNode(node.left, newNode);
      // Did the child balance change from 0? Then it's deeper
      if (childBalance == 0 && node.left.balance != 0) {
        node.balance = node.balance - 1;
      }
    }
  } else {
    if (node.right === undefined) {
      return node.setRight(newNode);
    } else {
      let childBalance = node.right.balance;
      node.right = insertNode(node.right, newNode);
      // Did the child balance change from 0? Then it's deeper
      if (childBalance == 0 && node.right.balance != 0) {
        node.balance = node.balance + 1;
      }
    }
  }
  // ... more ...
  return node;
}

In the previous version of this function, the two else clauses just called this.insertNode. This time, they start by remembering the balance of the child they are about to insert into, and if it changes to indicate that the height got deeper, then they will update the balance of this node.

Recall that we are only inserting things into this tree, so if a balance changes towards 0 then this means that a node that was heavy on one side has now evened out, and if it changes from 0 then it must have become deeper on one side.

This will record balances correctly, but it isn't an AVL tree unless those balances are kept within the range of -1 to 1. So now we need to add that operation by replacing the ... more ... comment with the following code:

  if (Math.abs(node.balance) == 2) {
    return rebalance(node);
  }

This just checks if the balance of a node has changed to -2 or +2, in which case a rebalance needs to happen. This is where the fun happens.

Rebalancing

The first step to rebalancing is to figure out the kind of rebalancing that is needed. There are 2 different types of rebalancing operations, but because of symmetry they are each duplicated for left and right.

Rather than duplicate the full explanation here, I will refer you to the Wikipedia article, and then you can come back here to see it play out.

Simple Rotation

The first type of rebalancing is a simple rotation of nodes. This is required when a node is heavy on one side, and the heavy child is also heavy on the same side. For instance, if a node has a balance of +2 (it has to be +2 if it's being rebalanced), and its right child has a balance of +1. I will call this an RR rotation, indicating that both nodes were heavy on the right. In this case, the child on the right becomes the new root of this branch, and the previously unbalanced node will become the left child that root. The children of these nodes must be switched appropriately to keep the ordering, and the balances of each node can be reset to zero.

function rebalanceRR(node) {
  nodeR = node.right;
  node.right = nodeR.left;
  nodeR.left = node;
  node.balance = 0;
  nodeR.balance = 0;
  return nodeR;
}

The rebalanceLL function is symmetric, with all references to right/left reversed:

function rebalanceLL(node) {
  nodeL = node.left;
  node.left = nodeL.right;
  nodeL.right = node;
  node.balance = 0;
  nodeL.balance = 0;
  return nodeL;
}

Double Rotation

Things get a little more complex when the heavy child is heavy on the inside rather than the outside of the structure. The resulting moves can be performed with a double rotation, but it is more straightforward to implement the overall move in a single step.

Starting with a child on the right that is heavy on the left, we have the rebalanceRL function:

function rebalanceRL(node) {
  nodeR = node.right;
  nodeRL = nodeR.left;
  node.right = nodeRL.left;
  nodeR.left = nodeRL.right;
  nodeRL.left = node;
  nodeRL.right = nodeR;
  if (nodeRL.balance < 0) {
    node.balance = 0;
    nodeR.balance = 1;
  } else if (nodeRL.balance > 0) {
    node.balance = -1;
    nodeR.balance = 0;
  } else {
    node.balance = 0;
    nodeR.balance = 0;
  }
  nodeRL.balance = 0;
  return nodeRL;
}

The first 6 lines accomplish the rotations. This puts the right/left child (nodeRL) at the root, the original root (node) to the left of this new root, the left child of nodeRL to the right of node, and the right child of the original root (nodeR) to the right of the new root. It took me a while of working it through with pencil and paper to fully understand this when I first encountered it, but once you try it out it all makes sense.

The last if/elseif/else block sets the balances appropriately. How these work out depends on a few things. Was the nodeRL node heavy on the left or the right? This changes the outcome such that either node or nodeR will be unbalanced, while the other is balanced. There is also the case where the root node is near the leaves of the tree. In this case, it will not have a left node, and nodeRL will not have any children at all, making its balance 0. In this case, both node and nodeR will get a final balance of 0.

While rotations are something that I find I can do in my head, determining the rebalancing from first principles is something that definitely requires pen and paper to work out for yourself.

Just like the duality of rebalanceRR/rebalanceLL, the rebalanceLR function is symmetric with rebalanceRL:

function rebalanceLR(node) {
  nodeL = node.left;
  nodeLR = nodeL.right;
  node.left = nodeLR.right;
  nodeL.right = nodeLR.left;
  nodeLR.right = node;
  nodeLR.left = nodeL;
  if (nodeLR.balance > 0) {
    node.balance = 0;
    nodeL.balance = -1;
  } else if (nodeLR.balance < 0) {
    node.balance = 1;
    nodeL.balance = 0;
  } else {
    node.balance = 0;
    nodeL.balance = 0;
  }
  nodeLR.balance = 0;
  return nodeLR;
}

Loading Up

Let's try putting some data into this, and see if it works as expected.

let digits =
  [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9,
   3, 2, 3, 8, 4, 6, 2, 6, 4, 3, 3, 8, 3, 2, 7,
   9, 5, 0, 2, 8, 8, 4, 1, 9, 7, 1, 6, 9, 3, 9,
   9, 3, 7, 5, 1, 0, 5, 8, 2, 0, 9, 7, 4, 9, 4,
   4, 5, 9, 2, 3, 0];

It's a mental exercise for me to try to remember these digits. I try to add on a few more every birthday. I figure that when I turn 100 I will really impress some people!

Inserting these digits in order should sort them. We'll start by creating the tree with the first digit, and then insert the rest:

let pi = new Tree(digits[0]);
digits.slice(1).forEach(d => pi.add(d));

Do they come out sorted? Let's traverse using the treeString function from part 7, remembering that the function takes a node, rather than the tree object:

> treeString(pi.root)
'0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9'

This looks about right. Lets compare with the correctly sorted digits:

> treeString(pi.root) === digits.sort().join(', ')
true

But how about the height of the tree?

digits is 66 elements long. An efficient binary tree should be storing at close to ⌈log2(n)⌉. The log2(66) ≈ 6.044, so the minimum possible number is 7, and we are hoping for a number not much larger than that:

> height(pi.root)
7

Which is fortuitously better than I'd hoped for 😊

Recap

After a long break, I've finally shown how to balance a binary tree using the AVL algorithm. While B-Trees are the industry standard for database storage, years of experience have shown AVL trees to be highly competitive for write-heavy storage.

There is quite a bit of redundancy around the symmetry of this code, and I will be demonstrating how to avoid this in the next installment. This has benefits in avoiding silly duplication errors, as well as moving a little closer to the structures that we eventually want in persistent storage.

Discussion

pic
Editor guide
Collapse
quoll profile image
Paula Gearon Author

To be honest, one reason I delayed getting back to this was that I didn’t want to use the resources that I pointed to, but rather I wanted to work the rebalancing from first principles myself. This is partly because that was how we had to do it when I first wrote one many years ago: rotations were described, but not how to recalculate the balances. Even worse… the online resource we used made the claim that all write operations were O(1), but this is only true for insertions. Deletions have a worst case of O(log(n)), and the deletions described were buggy because they did not include this. A colleague and I had to sit down and work deletions through from scratch to get it right. This experience stayed with me, and it’s what motivated me to make sure I had it all fresh in my head before writing about it.

Fortunately, I won’t be needing to delete from trees in the design I’m building towards, which has allowed me to skip these complexities.

To see the production code that we built for this (including this elusive delete operation) it’s all in the AVLNode code for Mulgara.