Hi all,
as I want to release version 1.0.0 of SirixDB[1] soon, but lack an Open Source community sadly I wanted to discuss here what you think is most important for future directions.
To keep it short SirixDB keeps the history of each resource in a database through a huge index-trie structure completely copy-on-write based. This means it shares unchanged database pages between revisions. SirixDB allows sophisticated time-travel queries and implements diffing algorithms. It stores XML and JSON in a binary format natively, but could as well store graphs or other kinds of data.
Ideas for the future would be:
- horizontally scaling, that is writing through a single master, providing reading your own writes consistency, replicate resources on a few cluster-nodes... most probably using ZooKeeper and Apache BookKeeper with exactly once delivery semantics...
- interactive visualizations of the differences between revisions of the resources. SirixDB currently stores tree structured data in a binary format, that is both XML and JSON. Diffing capabilities are already there. Also some outdated visualizations[2] in Processing which I'd love to port to D3 to the web. Furthermore a web-interface would be nice
- Adding cost-based query optimizer rules and index-rewrite rules to improve query performance considerably
- Looking into how to cleverly be able to delete old revisions (I have to look up how ZFS allows deletion of snapshots). However, as a kind of ugly hack a background process could for instance copy the most recent revision to a new resource for now. It's getting kind of tricky I guess as unchanged database pages are shared between revisions and record pages are even versioned. Thus, a page needs to be reconstructed from page fragments of different revisions depending on the algorithm used.
Besides I want to finish stuff for versioning the whole database, not just resources in a database.
Until recently I thought I'd look into horizontal scaling, to use the GraalVM for native images, that is to provide super fast startup times in docker containers, work on writing/reading from a Bookkeeper cluster and deploy everything to a Kubernetes cluster.
But maybe showcasing what's possible with beautiful interactive visualizations would get probably more attention and I think for me it would be great to learn front-end stuff, too. It might also be more useful due to the complete lack of users, thus it's only really interesting from an engineering perspective ;-)
Kind regards and have a great weekend
Johannes
[1] https://sirix.io and https://github.com/sirixdb/sirix
[2] https://m.youtube.com/watch?feature=youtu.be&v=l9CXXBkl5vI
Top comments (0)