Create account

DEV Community

Johannes Lichtenberger

Posted on Sep 25, 2019

Searching Contributors / Open Source project (SirixDB) / building a Vue.js and D3.js frontend

#hacktoberfest #contributorswanted #help #discuss

SirixDB - A versioned, temporal NoSQL data store for XML and JSON

Feel free to contribute on GitHub 💚

Hi all,

I just created my first ever Vue.js project to build a web front-end for SirixDB and to learn about single page apps, TypeScript, D3.js, Vue.js and the front-end world in general.

I first wanted to say a few things about what the purpose of SirixDB is and why we need another (versioned) database system.

SirixDB

SirixDB is all about efficient versioning of your data (currently XML and JSON stored in a binary format).

That is on the one hand it reduces the storage cost of storing a new revision during each transaction-commit while balancing read- and write-performance through a novel sliding snapshot algorithm and dynamic page compression. On the other hand SirixDB supports easy time travel query capabilities for instance to open a specific revision by a timestamp or several revisions by a given timespan, to navigate to future or past versions of nodes in the tree-structure and so on. It basically never overwrites data, batches writes and syncs them sequentially. It's heavily inspired by ZFS and Git. It borrows some ideas and puts these to test on the sub-file level.

In stark contrast to other approaches SirixDB combines copy-on-write semantics with database page-level versioning and does not require a write-ahead-log for consistency (a new UberPage is created last, when a new revision is committed to persistent storage).

The storage engine has been written from scratch and a resource in a SirixDB database basically consists of many hash-array based tries underneath an UberPage, which is the main entry point.

It all started around 2006 as a university / Ph.D. project of Marc Kramis. I already worked on the project since 2007 and did my Bachelor's Thesis, Master's Thesis, the Bachelor and Master projects as well as several HiWi-Jobs on topics regarding the project and I'm still more eager than ever to put forth the idea of a versioned, analytics plattform to perform analytical tasks based on current as well as the history of the data.

sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.

An Embeddable, Bitemporal, Append-Only Database System and Event Store

Stores small-sized, immutable snapshots of your data in an append-only manner. It facilitates querying and reconstructing the entire history as well as easy audits.

Download ZIP | Join us on Discord | Community Forum | Documentation | Architecture & Concepts

Working on your first Pull Request? You can learn how from this free series How to Contribute to an Open Source Project on GitHub and another tutorial: How YOU can contribute to OSS, a beginners guide

"Remember that you're lucky, even if you don't think you are because there's always something that you can be thankful for." - Esther Grace Earl (http://tswgo.org)

We want to build the database system together with you. Help us and become a maintainer yourself. Why? You may like the software and want to help us. Furthermore, you'll learn a lot. You may want to…

View on GitHub

Design Goals 🔥

Some of the most important core principles and design goals are:

Concurrent: SirixDB contains very few locks and aims to be as suitable for multithreaded systems as possible
Asynchronous: operations can happen independently; each transaction is bound to a specific revision and only one read/write-transaction on a resource is permitted concurrently to N read-only-transactions
Versioning/Revision history: SirixDB stores a revision history of every resource in the database without imposing extra overhead
Data integrity: SirixDB, like ZFS, stores full checksums of the pages in the parent pages. That means that almost all data corruption can be detected upon reading in the future, we aim to partition and replicate databases in the future
Copy-on-write semantics: similarly to the file systems Btrfs and ZFS, SirixDB uses CoW semantics, meaning that SirixDB never overwrites data. Instead, database-page fragments are copied/written to a new location
Per revision and per page versioning: SirixDB does not only version on a per revision, but also on a per page-base. Thus, whenever we change a potentially small fraction of records in a data-page, it does not have to copy the whole page and write it to a new location on a disk or flash drive. Instead, we can specify one of several versioning strategies known from backup systems or a novel sliding snapshot algorithm during the creation of a database resource. The versioning-type we specify is used by SirixDB to version data-pages
Guaranteed atomicity (without a WAL): the system will never enter an inconsistent state (unless there is hardware failure), meaning that unexpected power-off won't ever damage the system. This is accomplished without the overhead of a write-ahead-log (WAL)
Log-structured and SSD friendly: SirixDB batches writes and syncs everything sequentially to a flash drive during commits. It never overwrites committed data

Contributions

I have added #hacktoberfest labels to some open issues, but I guess the most interesting part now will be the web frontend. I envision certain interaction possibilities as for instance simply querying SirixDB and displaying the result, but also interactive visualizations to display what has changed between revisions of a resource stored in SirixDB (comparing XML and JSON basically). Maybe even a bunch of visualizations which are in synch with each other.

I've implemented some ideas in a Java Swing GUI and embedded visualizations built with processing (YouTube Screencast whereas you can find detailed explanations and screenshots in my Master's Thesis). As some of you told me, fof the web frontend I like to use d3js in combination with Vue.js, but I'm really new to the front-end stuff.

Let me know what you think :-) a community effort would be the most awesome thing ever. Would you be eager to help? Especially as I'm usually a backend engineer and might struggle a lot of times.

BTW: Another idea for the near future is scaling SirixDB with a distributed log, most probably via Apache BookKeeper directly or Apache Pulsar instead of Kafka.

Feel free to contribute on GitHub

Kind regards
Johannes