DEV Community

Things I Learned Building an Analytics Engine

Doug Black on October 20, 2018

Oh man, am I excited. This side project has been awhile coming. I just released Engauge Analytics (https://engaugeanalytics.com/), a web analytics...
Collapse
 
bgadrian profile image
Adrian B.G.

Any feedback for a relational database that scales large and gives quick queries?

You can migrate to MariaDB and scale your DB. It has multiple storage engines, and a master-to-master replication, sharding or the traditional master-slave. With the columnar storage I think it can handle PB of data.

But all the big analytics players uses simpler key-value (columnar) solutions, so they can scale horizontally. Collecting events and running crunching jobs to aggregate and enrich them is better than squeezing performance from a SQL query.

Side-projects are fun for us, devs, the problem arises when we want money out of it. Then all the stuff come that we do not want to handle, from laws to marketing, from customer support to hosting bills.

Collapse
 
dougblackjr profile image
Doug Black

Thank you! I went with Percona off the bat for this one...it's just soooooo fast.

Tell me more about the key-value solutions! This may be just what I'm looking for!

Collapse
 
bgadrian profile image
Adrian B.G.

At an abstract level:

Getting rid of the relationships, and using simple documents, you can shard better, with specific Storages like Cassandra.

Sharding an SQL, most of the times, it requires to get rid of the relationships and Joins. Even if it does not, it will add an overhead because it will query and group data from different shards, in a cascading effect.

If the "sharding" algorithm has to take into consideration data relationships, and wants to keep data as local as possible, then you will have "Hot" spots and unbalanced shards.

I don't say it is impossible to scale SQL, I say that it will be harder and more expensive, if you can afford Spanner from Google or a big setup of Vintess, or 5-8 big servers behind a Galera go ahead!

Bottom line, if you want to go beyond a few TBs of data, I would suggest rethink your structure in a Columnar way, and less SQLish.

Collapse
 
dougblackjr profile image
Doug Black

This is a great place to start! Thank you!