DEV Community

Paula Gearon
Paula Gearon

Posted on

Filing a Balance

In the most recent part of this series, we looked at how AVL trees are created and balanced. This post will extend this onto a mapped file. The beginning of this data structure series is here.

Back to the Map

Back in Part 8 we built a simple binary tree backed by a buffer that was mapped from a file. I'll revise a few of the principles here:

  • Each node of the tree was a regular structure, represented by 3 Java integers. This fixed the size of each node to 12 bytes.
  • Rather than storing data in the node object, getter and setter methods accessed the buffer.
  • Metadata about the tree (size of the file, location of the root), was stored at the start of the file (fitting into the space that might have otherwise been used for the first tree node).
  • After some setup in the tree constructor, operating the tree looked identical to a tree structure based on standard objects.

We will revisit all of these in this and future posts, with a few tweaks along the way.

Mea Culpa

Looking at the code that I wrote way back in September, I was sorry to see that none of my constants were marked as final. This is very poor form for Java code. I'll try to do better.

Rejuvinating the Tree

Updating the BufferTree class to implement an AVL tree is very similar to how we updated the JavaScript in the past 2 posts.

Nodes vs Elements

The previous version of BufferTree was based on the BufferList class which used a class called Element for the list structure. Now that we are firmly in the realm of trees, we're going to update this class name to Node.

Node Layout

The first thing to update the tree will be to update the node structure on disk. Unlike the Javascript, the child nodes are already stored in an indexable buffer, so the left and right sides do not need to be pulled out into an array. However, we do still need to introduce the balance value for the node. This can be done with 2 bits (we need only represent left/right/zero) but that requires bit packing. Moving to a single byte (8 bits) might seem appropriate, but this will lead to misalignment of the nodes since CPUs want integer values to be on a 4-byte boundary. So to keep things simple we will reserve a new integer for the balance:

  private static final int VALUE_OFFSET = 0;
  private static final int LEFT_OFFSET = 1;
  private static final int RIGHT_OFFSET = 2;
  private static final int BALANCE_OFFSET = 3;

We will also want to introduce constants for balance, along with a method for finding the other side:

  private static final int LEFT = -1;
  private static final int RIGHT = 1;

  private static final int other(int side) {
    return side == LEFT ? RIGHT : LEFT;
  }

It would be better to use an enum for the side, as this would disallow invalid numbers as a side argument, but I will leave it as an integer, for now, to better align with the Javascript implementation in the previous post.

Implementation

Other than changing the name from Element to Node, only a few changes are needed. First, we can remove the left/right labels and refer to a parameterized child instead. This affects the new value constructor:

  Node(int index, int value) {
    this(index);
    setValue(value);
    setChild(LEFT, null);
    setChild(RIGHT, null);
    setBalance(0);
  }

We also need access to the new balance value:

  public int getBalance() {
    return intBuffer.get(BALANCE_OFFSET);
  }

  public Node setBalance(int balance) {
    if (Math.abs(balance) > 2) {
      throw new RuntimeException("balance out of range: " + balance);
    }
    intBuffer.put(BALANCE_OFFSET, balance);
    return this;
  }

The check for balance allows the number to get up to 2. This is for convenience, as it makes the rebalancing code a little easier if the balance is allowed to temporarily get that large. The alternative is testing if the balance is about to get that large, and if so, then to note the sign of the integer and to inform the balancing operation of that information.

Finally, we need to replace the setters and getters for left and right. Because children can be updated during a rotation, we will not be setting the balance on a child automatically unless a leaf is being added. That can be distinguished with a different method called initChild(int,Node):

  public Node getChild(int side) {
    int childId = intBuffer.get(side == LEFT ? LEFT_OFFSET : RIGHT_OFFSET);
    return childId == NULL ? null : new Node(childId);
  }

  public Node setChild(int side, Node child) {
    intBuffer.put(side == LEFT ? LEFT_OFFSET : RIGHT_OFFSET,
                  child == null ? NULL : child.getIndex());
    return this;
  }

  public Node initChild(int side, Node child) {
    setChild(side, child);
    setBalance(getBalance() + side);
    return this;
  }

This updates the names that were used in the Javascript implementation, which were setChild and updateChild. The name of the setChild method was left to match with previous code, but initChild and setChild are clearer labels for us to move forward.

Adding Nodes

The original tree initialized the root with the insertion of the first node. Now that the tree gets balanced, the root may get updated. So we need to update the tree's add(int) method to update the root as well:

  public Node add(int value) {
    Node node = new Node(getAndIncNextAvailable(), value);
    if (root == null) {
      setRoot(node);
    } else {
      Node newRoot = insertNode(root, node);
      if (root != newRoot) {
        setRoot(newRoot);
      }
    }
    return root;
  }

This relies on the root value in the tree being kept up-to-date by the setRoot(Node) function. This was already happening in the BufferTree class:

  private void setRoot(Node node) {
    root = node;
    metaBuffer.put(ROOT_OFFSET, node.getIndex());
  }

Insertion

This is the first big change because now we can use the getChild/setChild methods, rather than the previous left/right functions. Functionally, nothing has changed except the addition of balancing. The resulting code is very similar to the Javascript function:

  private static Node insertNode(Node node, Node newNode) {
    int value = node.getValue();
    int side = newNode.getValue() < value ? LEFT : RIGHT;

    if (node.getChild(side) == null) {
      return node.initChild(side, newNode);
    }

    int childBalance = node.getChild(side).getBalance();
    node.setChild(side, insertNode(node.getChild(side), newNode));

    // Did the child balance change from 0? Then it's deeper
    int balance = node.getBalance();
    if (childBalance == 0 && node.getChild(side).getBalance() != 0) {
      balance += side;
      node.setBalance(balance);
    }

    if (Math.abs(balance) == 2) {
      return rebalance(node, balance);
    }
    return node;
  }

Note that when a child having a node inserted is null then this insertion is happening at a leaf. This will never result in rebalancing the leaf, which is why the node can be returned immediately. Otherwise, the remaining code inserts on the correct side, and then checks for balancing, setting the balance value and rebalancing if needed.

Balancing Act

The rebalancing code is almost identical to the Javascript implementation. The only real difference is that the balance is being updated with a setter method:

  private static Node rebalance(Node node, int balance) {
    var side = balance < 0 ? LEFT : RIGHT;

    if (side == node.getChild(side).getBalance()) {
      return rebalanceSS(node, side);
    } else {
      return rebalanceSO(node, side);
    }
  }

  private static Node rebalanceSS(Node node, int side) {
    var nodeS = node.getChild(side);
    node.setChild(side, nodeS.getChild(other(side)));
    nodeS.setChild(other(side), node);
    node.setBalance(0);
    nodeS.setBalance(0);
    return nodeS;
  }

  private static Node rebalanceSO(Node node, int side) {
    var otherSide = other(side);

    var nodeS = node.getChild(side);
    var nodeSO = nodeS.getChild(otherSide);
    node.setChild(side, nodeSO.getChild(otherSide));
    nodeS.setChild(otherSide, nodeSO.getChild(side));
    nodeSO.setChild(otherSide, node);
    nodeSO.setChild(side, nodeS);
    if (nodeSO.getBalance() == otherSide) {
      node.setBalance(0);
      nodeS.setBalance(side);
    } else if (nodeSO.getBalance() == side) {
      node.setBalance(otherSide);
      nodeS.setBalance(0);
    } else {
      node.setBalance(0);
      nodeS.setBalance(0);
    }
    nodeSO.setBalance(0);
    return nodeSO;
  }

Extraneous

Just like last time, let's add in a method to return a list of the elements in order:

  private static List<Integer> appendToList(List<Integer> acc, Node node) {
    if (node == null) {
      return acc;
    } else {
      appendToList(acc, node.getChild(LEFT));
      acc.add(node.getValue());
      return appendToList(acc, node.getChild(RIGHT));
    }
  }

  public List<Integer> asList() {
    return appendToList(new LinkedList<Integer>(), root);
  }

Time for π

Now that we have an AVL tree, let's try it out with the same digits of π as last time. We can then sort the digits and see how they compare to what the tree gives us:

public class TreeExample {

  static final int[] digits =
    new int[]
    {3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9, 3, 2, 3, 8, 4, 6, 2,
     6, 4, 3, 3, 8, 3, 2, 7, 9, 5, 0, 2, 8, 8, 4, 1, 9, 7, 1, 6, 9, 3,
     9, 9, 3, 7, 5, 1, 0, 5, 8, 2, 0, 9, 7, 4, 9, 4, 4, 5, 9, 2, 3, 0};

  public static void main(String[] args) throws IOException {
    BufferTree bufferTree = new BufferTree("tree.bin", 100);
    for (var i: digits) {
      bufferTree.add(i);
    }
    System.out.println(bufferTree);

    // put the original digits in a list and sort them
    var digitList = new ArrayList<Integer>(digits.length);
    for (var i: digits) digitList.add(i);
    digitList.sort(Comparator.comparingInt(x->x));

    // compare the list from the tree to the sorted list
    if (bufferTree.asList().equals(digitList)) {
      System.out.println("Digits sorted correctly");
    }
    bufferTree.close();
  }
}

Removing the previous data file (if it still exists), and running this new program gives us:

$ rm -f tree.bin
$ java avl.TreeExample
0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9
Digits sorted correctly

This has left a new data file in place, so let's try loading it with a different program:

public class TreeExample2 {
  public static void main(String[] args) throws IOException {
    BufferTree bufferTree = new BufferTree("tree.bin", 25);
    System.out.println(bufferTree);
    bufferTree.close();
  }
}

I left the "length" of the file at 25 to show that this was only a hint for setting the size when the file does not exist. But because the file is there, then this parameter does not matter.

Running this new program should print out the contents of the tree:

$ java avl.TreeExample2
0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9

The code for the BufferTree, ExampleTree, and ExampleTree2 files can all be found in this gist.

Recap

We still have a long way to go, but we are starting to get much closer to something with practical value. We are still only storing integers, but now that the buffer-backed trees are balanced we have both durability and efficiency.

Next, we will look at making data structures persistent.

Discussion (0)