loading...

Things I Learned Building an Analytics Engine

dougblackjr profile image Doug Black ・3 min read

Oh man, am I excited. This side project has been awhile coming.

I just released Engauge Analytics (https://engaugeanalytics.com/), a web analytics engine that allows you to get meaningful data from your website without stalking your users like some other large entities cough cough Google and Facebook cough cough.

What’s more, the paid tiers of Engauge allow you to get full, automated SEO evaluations of your sites, AND machine-learning driven content evaluations that help you know how your audience will react to your content BEFORE THEY VISIT.

It’s the geekiest thing I’ve ever made, and I’m crazy excited for it. But it was PAINFUL to make.
Much fire, many hurt

Here are some of the lessons I learned while building it.

Think About What You're Creating Before Building It
My first uh-oh

I was deep into building the UI for the app, with the analytics engine running and testing. A friend of mine got me a call with a friend who worked on product in a major company.

I walked him through the product from start to finish, using fancy words like "proprietary".

His first question: "So who are you making this for?"

Great question. I hadn't really thought about it.

So, side projects are fun and challenging. The biggest lesson I learned up front: If you're going to market it, think about who you're marketing it to.

Ask For Feedback
Let's escalate quickly!
Two weeks before release, I invited a number of users to test it in alpha and give feedback. The feedback was awesome, and lead to a lot of common-sense features that I had missed.

Some of it was a little tough to hear. This was my baby, and I was incubating it from scratch and asking folks for feedback on how that baby should be raised was sometimes painful.

But, with every suggestion came an opportunity to grow the app. Not every suggestion got implemented, but some of it showed me that the app would go nowhere without it.

The Data Got Big Quick
I mean, biiiiig
So, the way the app is set up, I knew the database would get big quick. I just didn't realize HOW big.

It was so big with the alpha users that it crashed the entire thing. I had to quickly scale, and think about scaling in a much bigger way.

Scaling the server and DB was easy, but I'm still not totally pleased with this. Any feedback for a relational database that scales large and gives quick queries?

Enjoy the Journey
It'll work out
Maybe you've been here, too: "Side project is done, now what's next? Let's build something else! A space monitor API that looks at dark matter, utilizing....webpack!"

This seems to be the first side project in a long while (maybe ever) that hasn't given me the itch to move on immediately. This is built, and I want to see it grow and succeed and scale like crazy.

I have to admit, I'm still really learning about it. I'll post some more, maybe some how-tos, on how I built some of this. But, I know there are still areas of growth in this that I haven't even touched. Do I copyright it? Patent the algorithms? Scale to a different database structure? Multi-tenancy?

Time to keep learning!

Discussion

pic
Editor guide
Collapse
bgadrian profile image
Adrian B.G.

Any feedback for a relational database that scales large and gives quick queries?

You can migrate to MariaDB and scale your DB. It has multiple storage engines, and a master-to-master replication, sharding or the traditional master-slave. With the columnar storage I think it can handle PB of data.

But all the big analytics players uses simpler key-value (columnar) solutions, so they can scale horizontally. Collecting events and running crunching jobs to aggregate and enrich them is better than squeezing performance from a SQL query.

Side-projects are fun for us, devs, the problem arises when we want money out of it. Then all the stuff come that we do not want to handle, from laws to marketing, from customer support to hosting bills.

Collapse
dougblackjr profile image
Doug Black Author

Thank you! I went with Percona off the bat for this one...it's just soooooo fast.

Tell me more about the key-value solutions! This may be just what I'm looking for!

Collapse
bgadrian profile image
Adrian B.G.

At an abstract level:

Getting rid of the relationships, and using simple documents, you can shard better, with specific Storages like Cassandra.

Sharding an SQL, most of the times, it requires to get rid of the relationships and Joins. Even if it does not, it will add an overhead because it will query and group data from different shards, in a cascading effect.

If the "sharding" algorithm has to take into consideration data relationships, and wants to keep data as local as possible, then you will have "Hot" spots and unbalanced shards.

I don't say it is impossible to scale SQL, I say that it will be harder and more expensive, if you can afford Spanner from Google or a big setup of Vintess, or 5-8 big servers behind a Galera go ahead!

Bottom line, if you want to go beyond a few TBs of data, I would suggest rethink your structure in a Columnar way, and less SQLish.

Collapse
dougblackjr profile image
Doug Black Author

This is a great place to start! Thank you!