Did you ever think about how Instagram, Twitter, Facebook or any social media platforms track who liked your posts? Let's figure it out in this pos...
For further actions, you may consider blocking this person and/or reporting abuse
Sounds like an interesting approach.
Did you have to limit the number of likes from users? (e.g. each user is only allowed to 'like' once.)
Also, am I right in thinking that only having Increment and Decrement commands safeguards the data against concurrency issues (where one update might be 'lost' due to another overwriting it)?
Totally right.
Since it's just +/- the current number doesn't matter at all. The reason that the only type setted on the table needs to be counter.
Should be fine with concurrency... You'll still want a separate likes table for post-like and probably a user-like that references the posts liked by a user.
Scylla/Cassandra is not atomic across tables... And query performance isn't always great. So sometimes you'll want your own 1:n mappings both ways.
Ah, great idea - with the user-like table, you'd also be able to see how many times a user has 'liked' a post (if that was a requirement).
It's nice about finding the posts a user looked... From the user profile or activities
I'm wondering why not just recount the total likes on each like update, and update the count field accordingly. Not as performant as increments, but feels safer for accuracy in case of issues under the hood. No idea if this way is done, but makes sense to me.
This is discussed in TFA. You wind up creating a new record for each update to a counts field in a posts table.
Having the separate analytics table, you can have atomic counters that effectively do the same. Yes it's an n+1 request,. But with horizontal scaling and often faster than joins for a single view item.
I think you're misunderstanding. You wouldn't use joins in my case. It would save the value, it would just recalculate it instead of incrementing. After dealing with Firestore, I don't trust increments to be accurate.
Great write up!
I think that this is a good example on where streaming databases like Materialize can really help. You can essentially create a live materialized view and the result will be kept up to date as the data comes.
In traditional DBs, data is stored in tables matching the structure of the writes (inserts, updates) and all the computation work happens on read queries (selects). A streaming database moves the work to the write side: Instead of just storing data on writes, a streaming database asks for the queries upfront (in the form of Materialized Views) and incrementally updates results as input data arrives.
If the database we use doesn't have counter type, is it possible to just use integer type instead? I see on your explanation that counter type is just an alias for integer type, right? Does it affect the performance of our application if we use integer type instead?
The idea behind the counter is to use always increment and decrement operation, eg: item + 1 || item - 1
Since it will not depend on which number stopped on the last row, you can deal easily with concurrent connections without loss any data.
If you're using a classic rdbms like postgres/MySQL/mssql then yes. There should be an atomic update to increment/decrement. Of course index on the likes table should work for count(postid) as well ..
Cassandra and ScyllaDB have differing behaviors for the benefit of horizontal scaling.
Thanks for the fun read! It makes me interested in checking out Scylla.
I do have a question that may not be related to the point of the article, but is there a reason you included the created_at field in the primary key of the posts table? I would think that the id field by itself would be enough for the PK.
At ScyllaDB we have a Partititon Key (like Primary) and Clustering Keys (Ordering Keys).
You can only do a query using PK or CK to guarantee that your query will be faster and both data should be present in every row.
Also I forgot to put user_id as clustering key for this example :p
I'm puzzled why we don't simply add up the total amount of likes with each update and then edit the count column as needed. Though not as efficient as incrementing, this appears to be a more dependable means of ensuring accuracy, especially if there are unanticipated complications. I'm not sure if this strategy is already in use, but it sounds reasonable to me.
I wonder if it would be even better to only do that every... 30 or so likes. So basically, make a condition where if the count is divisible by 30, recalculate and overwrite the count. Best of both worlds? Or is the condition too expensive itself? At least it doesn't involve reading another table.
Your last model (posts, post_analytics, post_likes and possibly user_likes) reminds me of a presentor/aggregator architecture style from a presentation on event driven systems. On scale, i would probably lean there even
Thay would mean one service, a so called frontend or an api , would publish "userLikedPost" event when someone clicks like
A userAggregator would calculate user stats and store what post this user liked, while a postAggregator would do the same. Different UIs would then query these two aggregators through an api gateway or so.
This decoupling ties you less into finding one db that fits all needs, at the cost of just that - decoupling and maintaining more services and tech stack choices
But its interesting to think about 👍
Neat. I have never heard of this DB system before. I wonder if anyone has clever ideas for how to do this well in MySQL as it is not always practical to move to a different DB for a single use case, and MySQL doesn't have a good way to do this without table locking, I think 😱
If you have a huge system probably you will work with multiple databases.
Scylla is a database that can handle monstrously amount of data easily if you model in the right way.
We have a free cloud environment, give it a try!
Excellent article!
Water bottle filled. Good article, primo!
Thanks! Hope that it helped you someway <3
Nice to see the concept of signal applied to database modelling, thanks for the awesome knowledge
Awesome article!
I'm wondering how many databases (or even database types!) can be involved in a single social media feature...
Depends on your needs and the volume necessary
You can look at the likes of stack overflow as a website that scaled massively just with Microsoft SQL Server.
You could also use Redis with persistence if you don't mind a secondary data source or have different scaling needs.
Every 10x of users past the first 10k or so will mean differing approaches. At 1m or so you start needing something more like ScyllaDB.
BRABO!
Quite informative. Welldone Daniel.
I am trying to understand the benefits of being atomic.
If multiple users run the update query, they will probably get a deadlock back.
Deadlock is impossible.
Counters implementation is base on CRDT, which is explained here:
scylladb.com/2017/04/04/counters/
This article is great, but Vacilinho is better!
Send me one please 🥺
Great article!
Excellent article!
Great!
Pretty good content
Great article!
Excellent article!
Nice Article, congrats primo!
Good article!
Nice Article!
Awesome article!!!
Good job!
Thanks dude <3
Nice Primo!
Amazing article as always, thank you for sharing your knowledge Daniel!
Nice article! Just learned a lot. Keep up the amazing work! :)
First time learning about Scylla, and looks quite powerful! Totally will be using for future projects.
Nice Article!
Really good article. Congratulations Daniel.
Nice article, congrats Dan!
great article!
Interesting thank you for sharing. Your post was clear. Continue working on making them clearer. Enjoy !
Cara, sou um pouco iniciante quando se fala de modelagem de banco de dados e pude compreender completamente a explicação. Estou esperando ansiosamente pela próxima postagem.
Excellent article! Congratulations primo! <3
good article, congratulations!
Nice Article!
Both solutions are needed to avoid gaming likes.
You only need a single device to like and retract a like for a subjectId, so it could locally increment and decrement from a shared count.
Limits, separation of necessity and glamour and isolation of effect are key to engineering.
I made a small DB just to store counts. Its O(1) db, and it not an append only, and few lines of code. Tested only on linux.
github.com/anuragvohraec/CounterDB
when you create socials.posts table add column post_analytics and its type JSON/TEXT based. after that let's suppose the user inserts data and you make
json structure like this
analytics=[{
"username": text
"likes": "like/dislike"
"likes_at": timeStamp()
}]
every time users like or dislike value update on it by the query.
when any user sees a first-time post and likes the post it counts as 1. If he/she dislikes the post then data will update in json related to the user
Great idea.
I am testing how likes are stored.
That's great article brother. 🚀
wonder what this would look like in a graph database.
possibly i'm not understanding the article, but how does this approach differs from the traditional Database Normalization?
I would say I am still a junior, hence I won't feel stupid for asking, why did no one mentioned database triggers and function?