Paula Gearon

Posted on Aug 22, 2019 • Edited on Apr 22, 2020

Storage

Part 5 in a series on how I'm building a database. Part 4 was here, and the beginning was here.

Expanding

So far we've built a structure for representing linked lists and representing them in a raw buffer in Java. Buffers exist in Javascript too, and I do plan to demonstrate the same there eventually, but at this stage, I want to focus on what is being built more than how to build it.

The linked lists in previous examples were always of the same sequence of numbers: 1, 1, 2, 3, 5, 8, 13, 21, 34. This list was also allocated in order, making the data much more regular than can be expected in most applications. So let's try something a bit different.

First of all, let's create a different sequence of numbers that shares the last 4 from the first list: 5, 6, 7, 8, 13, 21, 34.
Then, let's create an unrelated third list: 3, 1, 4, 1, 5, 9, 2, 6.

Because we are moving around a bit inside the buffer, then this shows that I had a bug in the existing implementation. Oops. 😳

(Yes, I can go back to my earlier post and fix the bug, but this is a blog, not a book! I'm going to admit my mistake).

The original definition of BufferList always takes a slice by setting the position, and then the limit. This worked while we were only moving the position to the next space beyond the last limit used, but now that it's jumping around a bit it shows that we were violating the buffer invariants that the limit always will be at, or above, the position.

To fix this, we just need to see if the new position will be above or below the current limit. If it's above, then the limit can be set first, otherwise, the position can be set first:

Element(int index) {
  this.index = index;
  int offset = index * ELEMENT_SIZE;
  if (offset > buffer.limit()) {
    intBuffer = buffer.limit(offset + ELEMENT_SIZE).position(offset).slice().asIntBuffer();
  } else {
    intBuffer = buffer.position(offset).limit(offset + ELEMENT_SIZE).slice().asIntBuffer();
  }
}

Incidentally, this code only works in Java 9 and above. Before that, the abstract Buffer class did not support slice(), although ByteBuffer did. So on Java 8 and earlier, the following modification would be needed:

ByteBuffer bb;
if (offset > buffer.limit()) {
  bb = (ByteBuffer)buffer.limit(offset + ELEMENT_SIZE).position(offset);
} else {
  bb = (ByteBuffer)buffer.position(offset).limit(offset + ELEMENT_SIZE);
}
intBuffer = bb.slice().asIntBuffer();

My apologies if you tried my buffer slices from the last post on Java 8 and they didn't work!

Building the Lists

Now that this is done, we can build our new lists. Of course, we will need a bigger buffer to store all of this, so we'll increase it to 20.

BufferList bufferList = new BufferList(20);
BufferList.Element list1, list2, list3;

list1 = bufferList.addToHead(34);
list1 = bufferList.addToHead(list1, 21);
list1 = bufferList.addToHead(list1, 13);
list2 = bufferList.addToHead(list1, 8);  // note that this is list2
list1 = bufferList.addToHead(list2, 5);
list1 = bufferList.addToHead(list1, 3);
list1 = bufferList.addToHead(list1, 2);
list1 = bufferList.addToHead(list1, 1);
list1 = bufferList.addToHead(list1, 1);

list2 = bufferList.addToHead(list2, 7);
list2 = bufferList.addToHead(list2, 6);
list2 = bufferList.addToHead(list2, 5);

list3 = bufferList.addToHead(6);
list3 = bufferList.addToHead(list3, 2);
list3 = bufferList.addToHead(list3, 9);
list3 = bufferList.addToHead(list3, 5);
list3 = bufferList.addToHead(list3, 1);
list3 = bufferList.addToHead(list3, 4);
list3 = bufferList.addToHead(list3, 1);
list3 = bufferList.addToHead(list3, 3);

When adding the number 8 to the first list, we keep hold of that point in the list, and re-use it in the second list. This means that both lists share a tail. If the tail changed, then both lists would see the change. I'll return to this idea of shared structures when I get to Functional Data Structures.

Having 3 lists, we should be able to see the contents of all 3:

System.out.println(list1);
System.out.println(list2);
System.out.println(list3);

The output is:

$ java buffer.MultiExample
1, 1, 2, 3, 5, 8, 13, 21, 34
5, 6, 7, 8, 13, 21, 34
3, 1, 4, 1, 5, 9, 2, 6

File I/O

At this point, the structure can be stored durably by storing the buffer somewhere. The only things missing are the starts of the lists, which are associated with the elements list1, list2 and list3. This data will also need to be stored, but since our buffer only holds elements, they will be stored close by rather than inside the list data structure.

Metadata Read/Write

All the metadata about the buffer that we need to reconstruct the lists is found in the index values for the starts of the lists. We can write that to a file with a small method:

static void writeLists(BufferList.Element... lists) throws IOException {
  RandomAccessFile file = new RandomAccessFile("meta.bin", "rw");
  for (BufferList.Element element: lists) {
    file.writeInt(element.getIndex());
  }
  file.close();
}

This creates the file in read/write mode, then goes through all of the elements given to it and writes their index values to the file.

Reading this back means reading those index values, and then asking the buffer to create the associated elements:

static BufferList.Element[] readLists(BufferList bufferList) throws IOException {
  RandomAccessFile file = new RandomAccessFile("meta.bin", "r");
  int length = (int)(file.length() / Integer.BYTES);
  BufferList.Element[] lists = new BufferList.Element[length];
  for (int number = 0; number < length; number++) {
    lists[number] = bufferList.new Element(file.readInt());
  }
  file.close();
  return lists;
}

This opens the file for reading, then figures out how many integers are in the file based on its size. This gives the size of the array to return, and a loop reads each of the integers in the file and creates the associated Elements to return.

Side Note

Both of these methods are illustrative of how some metadata might be stored, though for real-world applications we can expect something a little more tailored to the actual data structures in use. Typically, this metadata will be attached to the Buffer class that the metadata is representing so that reading/writing it does not appear as a separate step in the API. But I think it helps to see it all happening here.

Buffer Read/Write

The BufferList itself only needs to write or read its buffer as a single block.

public class BufferList {
...
  public void write(String filename) throws IOException {
    RandomAccessFile file = new RandomAccessFile(filename, "rw");
    buffer.position(0).limit(nextAvailable * ELEMENT_SIZE);
    file.getChannel().write(buffer);
    file.close();
  }

  public static BufferList read(String filename) throws IOException {
    RandomAccessFile file = new RandomAccessFile(filename, "r");
    ByteBuffer byteBuffer = ByteBuffer.allocate((int)file.length());
    file.getChannel().read(byteBuffer);
    file.close();
    return new BufferList(byteBuffer);
  }
}

Note that the only part of the buffer that is written is between the current position and the limit, which is why these need to be set in the write method.

The Other Side

With all this read/write in place we can update the main program to write the buffer and its metadata:

public class MultiExample {
  public static void main(String[] args) throws IOException {
    BufferList bufferList = new BufferList(20);
    BufferList.Element list1, list2, list3;

    // ... populate the lists, as above ...

    System.out.println(list1);
    System.out.println(list2);
    System.out.println(list3);
    bufferList.write("buffer.bin");
    writeLists(list1, list2, list3);
  }
}

With this main program writing the buffer, can we reload it later? Let's write a second program to try:

public class MultiExample2 {
  public static void main(String[] args) throws IOException {
    BufferList bufferList = BufferList.read("buffer.bin");
    for (BufferList.Element list: readLists(bufferList)) {
      System.out.println(list);
    }
  }
}

This is a totally fresh process that opens the file, and prints the following:

$ java buffer.MultiExample2
1, 1, 2, 3, 5, 8, 13, 21, 34
5, 6, 7, 8, 13, 21, 34
3, 1, 4, 1, 5, 9, 2, 6

Note how very little code is needed to load everything? This comes from the flexibility of the Element objects overlaying the buffer.

The complete source code for this can be found here on Github.

Discussion

As noted above, adding the metadata to the class definition removes the need for the client to manually process the other file, and also avoids having the client code having a direct reference to the Element constructor. This blog is still building a basic framework so there is no need to clean everything up just yet.

Javascript has a similar object to buffers in the TypedArrays. These objects can then be moved around similarly to a Java Buffer, and may be stringified for storage/retrieval from local storage, which lets us take this technique across to that system as well.

Unfortunately, the code as written here does not scale particularly well, as it requires the entire buffer to be written or read at once. Instead, we want a mechanism that can read or write individual buffers at a particular file offset. The read can be done during the Element constructor, and the write can be done when the element is finished with. In real-world systems, we will often put a dirty flag on the elements and queue these modified elements for writing whenever they get updated. Modifications to an element are easy to catch since any modification has already been defined via setters and getters in order to modify the underlying buffer.

To manage this in Javascript, storing a TypedArray in local storage for each element would be a mistake, as there would be a large number of elements, leading to an excessive number of local storage entries. Instead, TypedArray objects could chunk up groups of elements, and then provide an API for accessing individual elements. For instance, when requesting element index 1234, with a grouping of 1000 elements, the group number and element offset within the group would be calculated as:

groupSize = 1000;
group = Math.floor(1234 / groupSize);
offset = 1234 % groupSize;

Java's needs are different, as it can read/write any part of a buffer, from a single element to the entire buffer at once. However, an entire file should be represented by a single Java buffer, since a large file would result in a large amount of memory being used to mirror it.

This takes me to the next step of local disk storage: Part 6 - Memory Mapping.

Latest comments (1)

Faizan Afridi • Sep 9 '24

I'm thrilled with my TeraBox Premium! The amount of storage and the easy sharing options are incredible. It's so nice to have all my files in one place, accessible anytime. Definitely a top pick for cloud storage for free.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.