DEV Community

Johannes Lichtenberger
Johannes Lichtenberger

Posted on

Searching Contributors / Open Source project (SirixDB) / building a Vue.js and D3.js frontend

SirixDB - A versioned, temporal NoSQL data store for XML and JSON

Feel free to contribute on GitHub 💚

Hi all,

I just created my first ever Vue.js project to build a web front-end for SirixDB and to learn about single page apps, TypeScript, D3.js, Vue.js and the front-end world in general.

I first wanted to say a few things about what the purpose of SirixDB is and why we need another (versioned) database system.

SirixDB

SirixDB is all about efficient versioning of your data (currently XML and JSON stored in a binary format).

That is on the one hand it reduces the storage cost of storing a new revision during each transaction-commit while balancing read- and write-performance through a novel sliding snapshot algorithm and dynamic page compression. On the other hand SirixDB supports easy time travel query capabilities for instance to open a specific revision by a timestamp or several revisions by a given timespan, to navigate to future or past versions of nodes in the tree-structure and so on. It basically never overwrites data, batches writes and syncs them sequentially. It's heavily inspired by ZFS and Git. It borrows some ideas and puts these to test on the sub-file level.

In stark contrast to other approaches SirixDB combines copy-on-write semantics with database page-level versioning and does not require a write-ahead-log for consistency (a new UberPage is created last, when a new revision is committed to persistent storage).

The storage engine has been written from scratch and a resource in a SirixDB database basically consists of many hash-array based tries underneath an UberPage, which is the main entry point.

It all started around 2006 as a university / Ph.D. project of Marc Kramis. I already worked on the project since 2007 and did my Bachelor's Thesis, Master's Thesis, the Bachelor and Master projects as well as several HiWi-Jobs on topics regarding the project and I'm still more eager than ever to put forth the idea of a versioned, analytics plattform to perform analytical tasks based on current as well as the history of the data.

GitHub logo sirixdb / sirix

SirixDB facilitates effective and efficient storing and querying of your temporal data. Every commit stores a space-efficient snapshot. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach called sliding snapshot.

Build Status All Contributors CodeFactor PRs Welcome Maven Central Coverage Status

Tweet

Follow

Download ZIP | Join us on Slack | Community Forum

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub and another tutorial: How YOU can contribute to OSS, a beginners guide

SirixDB - An Evolutionary, Temporal NoSQL Document Store

Store and query revisions of your data efficiently

"Remember that you're lucky, even if you don't think you are, because there's always something that you can be thankful for." - Esther Grace Earl (http://tswgo.org)

We currently support the storage and (time travel) querying of both XML - and JSON-data in our binary encoding which is tailored to support versioning. Our index-structures and the whole storage engine has been written from scratch to support versioning natively. In the future, we might also support the storage and querying of other data formats.

 

 

Note: Work on a

Design Goals 🔥

Some of the most important core principles and design goals are:

Concurrent
SirixDB contains very few locks and aims to be as suitable for multithreaded systems as possible
Asynchronous
operations can happen independently; each transaction is bound to a specific revision and only one read/write-transaction on a resource is permitted concurrently to N read-only-transactions
Versioning/Revision history
SirixDB stores a revision history of every resource in the database without imposing extra overhead
Data integrity
SirixDB, like ZFS, stores full checksums of the pages in the parent pages. That means that almost all data corruption can be detected upon reading in the future, we aim to partition and replicate databases in the future
Copy-on-write semantics
similarly to the file systems Btrfs and ZFS, SirixDB uses CoW semantics, meaning that SirixDB never overwrites data. Instead, database-page fragments are copied/written to a new location
Per revision and per page versioning
SirixDB does not only version on a per revision, but also on a per page-base. Thus, whenever we change a potentially small fraction of records in a data-page, it does not have to copy the whole page and write it to a new location on a disk or flash drive. Instead, we can specify one of several versioning strategies known from backup systems or a novel sliding snapshot algorithm during the creation of a database resource. The versioning-type we specify is used by SirixDB to version data-pages
Guaranteed atomicity (without a WAL)
the system will never enter an inconsistent state (unless there is hardware failure), meaning that unexpected power-off won't ever damage the system. This is accomplished without the overhead of a write-ahead-log (WAL)
Log-structured and SSD friendly
SirixDB batches writes and syncs everything sequentially to a flash drive during commits. It never overwrites committed data

Contributions

I have added #hacktoberfest labels to some open issues, but I guess the most interesting part now will be the web frontend. I envision certain interaction possibilities as for instance simply querying SirixDB and displaying the result, but also interactive visualizations to display what has changed between revisions of a resource stored in SirixDB (comparing XML and JSON basically). Maybe even a bunch of visualizations which are in synch with each other.

I've implemented some ideas in a Java Swing GUI and embedded visualizations built with processing (YouTube Screencast whereas you can find detailed explanations and screenshots in my Master's Thesis). As some of you told me, fof the web frontend I like to use d3js in combination with Vue.js, but I'm really new to the front-end stuff.

Let me know what you think :-) a community effort would be the most awesome thing ever. Would you be eager to help? Especially as I'm usually a backend engineer and might struggle a lot of times.

BTW: Another idea for the near future is scaling SirixDB with a distributed log, most probably via Apache BookKeeper directly or Apache Pulsar instead of Kafka.

Feel free to contribute on GitHub

Kind regards
Johannes

Top comments (0)