Tim Armstrong

Posted on Jul 12, 2021 • Originally published at blog.plaintextnerds.com

Tutorial - Building a database with LMDB - Part1/Architecture

#lmdb #python #nosql #database

In the article LMDB - Faster NoSQL than MongoDB I showed how LMDB can be used to achieve significantly higher performance when compared to MongoDB. A lot of this speed is coming from LMDB being a memory-mapped database and various optimisations that allow it to make optimal use of the OS's buffer cache and CPU's L1 cache structures.
But what I didn't cover was how to use it in a real-life application. So, let's do just that.

This series is going to go over the architecture side of building a custom database solution, and why you might want to actually take the plunge.

When does it make sense?

When designing a database from the ground up, as with most significant projects, it's important to understand what features are important and what isn't. This starts with understanding why you're building a database.

Here are some good reasons:

Pedagogical / Educational experience
Strict performance requirements (at the expense of features and development time)

However, if your reasons are in this next list, you might want to reconsider:

"I can do it better"

Why does this matter, LMDB is already a database, isn't it?

LMDB is a database in the same sense that SQLite3 is a database: It has ACID Transactions, It keeps a copy on disk, It is crash resilient, It serialises writes.

But it doesn't have a remote interface - no sockets, no ports; no network support of any kind. Meaning that if you need any of that then you need to build it.

Out of the box, LMDB could be described as a hashmap with ACID transactions.

This means: If you need multiple servers to connect to it, then you need to build that; If you need backups then you need to build that; If you need sub-object indexing then you need to build that too.

So what are we going to build?

Let's assume the following: We're building a measurement platform where our edge nodes receive UDP packets from the apparatus containing: a single sample, an identifier, and a timestamp. The edge nodes need to ship aggregated measurements to a centralised API periodically.

Let's start by considering the edge nodes:

When they receive a UDP packet they need to reflect the timestamp to the apparatus as soon as it's recorded (kind of like an ACK, but lazier). We want to do this as quickly as possible because the apparatus is busy waiting and will re-transmit if it hasn't received this ACK within a very short period.

So then let's define the requirements for our edge nodes:

The system must ensure the safe storage of all samples
The system must acknowledge each sample as quickly as possible
The system must deduplicate any re-transmissions
The system must periodically send aggregated copies of the sample data upstream

We can model these requirements as two processes connected by a database:

The left-hand process acts as a server and receives the packet, inserts it using the identifier+timestamp as the key, appends this key to the index of samples in this time window, and finally then sends the ACK.

The right-hand process periodically wakes up, reads the index, collates the samples, and makes an HTTP POST (containing all of the samples from this time window) to the upstream API.

There are, of course, already products for this, Apache Pulsar, Redis, and RabbitMQ all spring to mind as potential solutions to this task. But that's not why you're here, so we'll assume that there are reasons that you don't want to build on top of any of those.

Keep an eye open for Part 2, where we’ll cover Data Structures and CTypes.
See you there!

DEV Community

Tutorial - Building a database with LMDB - Part1/Architecture

When does it make sense?

Why does this matter, LMDB is already a database, isn't it?

So what are we going to build?

Top comments (0)

Read next

Code Archeologist: AI-Powered Git Repository Analysis with PostgreSQL

Building a Chess Game with Python and OpenAI

We made an AI SWE that solved 48.60% of issues on the SWE bench, 100% open-source.

Advent of Code '24 - Day 13 Claw Contraption