DEV Community

Database 101: How social media “likes” are stored in a database

Daniel Reis on May 08, 2023

Did you ever think about how Instagram, Twitter, Facebook or any social media platforms track who liked your posts? Let's figure it out in this pos...

Read full post

Anthony Fung • May 8 '23

Sounds like an interesting approach.

Did you have to limit the number of likes from users? (e.g. each user is only allowed to 'like' once.)

Also, am I right in thinking that only having Increment and Decrement commands safeguards the data against concurrency issues (where one update might be 'lost' due to another overwriting it)?

Daniel Reis • May 8 '23

Totally right.

Since it's just +/- the current number doesn't matter at all. The reason that the only type setted on the table needs to be counter.

Michael J. Ryan • May 9 '23

Should be fine with concurrency... You'll still want a separate likes table for post-like and probably a user-like that references the posts liked by a user.

Scylla/Cassandra is not atomic across tables... And query performance isn't always great. So sometimes you'll want your own 1:n mappings both ways.

Anthony Fung • May 9 '23

Ah, great idea - with the user-like table, you'd also be able to see how many times a user has 'liked' a post (if that was a requirement).

Michael J. Ryan • May 9 '23

It's nice about finding the posts a user looked... From the user profile or activities

Jonathan Gamble • May 9 '23

I'm wondering why not just recount the total likes on each like update, and update the count field accordingly. Not as performant as increments, but feels safer for accuracy in case of issues under the hood. No idea if this way is done, but makes sense to me.

Michael J. Ryan • May 9 '23

This is discussed in TFA. You wind up creating a new record for each update to a counts field in a posts table.

Having the separate analytics table, you can have atomic counters that effectively do the same. Yes it's an n+1 request,. But with horizontal scaling and often faster than joins for a single view item.

Jonathan Gamble • May 9 '23

I think you're misunderstanding. You wouldn't use joins in my case. It would save the value, it would just recalculate it instead of incrementing. After dealing with Firestore, I don't trust increments to be accurate.

Bobby Iliev • May 15 '23 • Edited

Great write up!

I think that this is a good example on where streaming databases like Materialize can really help. You can essentially create a live materialized view and the result will be kept up to date as the data comes.

In traditional DBs, data is stored in tables matching the structure of the writes (inserts, updates) and all the computation work happens on read queries (selects). A streaming database moves the work to the write side: Instead of just storing data on writes, a streaming database asks for the queries upfront (in the form of Materialized Views) and incrementally updates results as input data arrives.

As Manjaka Josvah • May 9 '23

If the database we use doesn't have counter type, is it possible to just use integer type instead? I see on your explanation that counter type is just an alias for integer type, right? Does it affect the performance of our application if we use integer type instead?

Daniel Reis • May 9 '23

The idea behind the counter is to use always increment and decrement operation, eg: item + 1 || item - 1

Since it will not depend on which number stopped on the last row, you can deal easily with concurrent connections without loss any data.

Michael J. Ryan • May 9 '23

If you're using a classic rdbms like postgres/MySQL/mssql then yes. There should be an atomic update to increment/decrement. Of course index on the likes table should work for count(postid) as well ..

Cassandra and ScyllaDB have differing behaviors for the benefit of horizontal scaling.

Phil Goldenberg • May 10 '23

Thanks for the fun read! It makes me interested in checking out Scylla.

I do have a question that may not be related to the point of the article, but is there a reason you included the created_at field in the primary key of the posts table? I would think that the id field by itself would be enough for the PK.

Daniel Reis • May 10 '23

At ScyllaDB we have a Partititon Key (like Primary) and Clustering Keys (Ordering Keys).

You can only do a query using PK or CK to guarantee that your query will be faster and both data should be present in every row.

Also I forgot to put user_id as clustering key for this example :p

Ahsan Mangal 👨🏻‍💻 • May 10 '23

I'm puzzled why we don't simply add up the total amount of likes with each update and then edit the count column as needed. Though not as efficient as incrementing, this appears to be a more dependable means of ensuring accuracy, especially if there are unanticipated complications. I'm not sure if this strategy is already in use, but it sounds reasonable to me.

quin • Nov 15 '23

I wonder if it would be even better to only do that every... 30 or so likes. So basically, make a condition where if the count is divisible by 30, recalculate and overwrite the count. Best of both worlds? Or is the condition too expensive itself? At least it doesn't involve reading another table.

Pierre Vahlberg • May 9 '23

Your last model (posts, post_analytics, post_likes and possibly user_likes) reminds me of a presentor/aggregator architecture style from a presentation on event driven systems. On scale, i would probably lean there even

Thay would mean one service, a so called frontend or an api , would publish "userLikedPost" event when someone clicks like

A userAggregator would calculate user stats and store what post this user liked, while a postAggregator would do the same. Different UIs would then query these two aggregators through an api gateway or so.

This decoupling ties you less into finding one db that fits all needs, at the cost of just that - decoupling and maintaining more services and tech stack choices

But its interesting to think about 👍

Kirill Birger • May 10 '23

Neat. I have never heard of this DB system before. I wonder if anyone has clever ideas for how to do this well in MySQL as it is not always practical to move to a different DB for a single use case, and MySQL doesn't have a good way to do this without table locking, I think 😱

Daniel Reis • May 10 '23

If you have a huge system probably you will work with multiple databases.

Scylla is a database that can handle monstrously amount of data easily if you model in the right way.

We have a free cloud environment, give it a try!

guto • May 8 '23

Excellent article!

Yasmin Felício • May 8 '23

Water bottle filled. Good article, primo!

Daniel Reis • May 8 '23

Thanks! Hope that it helped you someway <3

Cherry Ramatis • May 9 '23

Nice to see the concept of signal applied to database modelling, thanks for the awesome knowledge

Thyago Vinicius Camilo Cunha • May 9 '23 • Edited

Awesome article!

I'm wondering how many databases (or even database types!) can be involved in a single social media feature...

Michael J. Ryan • May 9 '23

Depends on your needs and the volume necessary

You can look at the likes of stack overflow as a website that scaled massively just with Microsoft SQL Server.

You could also use Redis with persistence if you don't mind a secondary data source or have different scaling needs.

Every 10x of users past the first 10k or so will mean differing approaches. At 1m or so you start needing something more like ScyllaDB.

Vinicius Dias • May 16 '23

BRABO!

James Ajayi • May 13 '23

Quite informative. Welldone Daniel.

Nevo David • May 10 '23

I am trying to understand the benefits of being atomic.

If multiple users run the update query, they will probably get a deadlock back.

Tzach Livyatan • May 10 '23

Deadlock is impossible.
Counters implementation is base on CRDT, which is explained here:
scylladb.com/2017/04/04/counters/

João Vitor • May 8 '23

This article is great, but Vacilinho is better!
Send me one please 🥺

Flavio • May 8 '23

Great article!

Angelo Avila Mesquita • May 8 '23

Excellent article!

Renan Vidal Rodrigues • May 9 '23

Great!

Rudi • May 9 '23

Pretty good content

Wanderson José • May 9 '23

Great article!

Bruno Vieira • May 9 '23

Excellent article!

Jean Carlos Izepon • May 8 '23

Nice Article, congrats primo!

Samuel Souza Cabral • May 8 '23

Good article!

Rafael Duarte Amaral • May 8 '23

Nice Article!

Mario Souto • May 8 '23

Awesome article!!!

Henri Borges • May 8 '23

Good job!

Daniel Reis • May 8 '23

Thanks dude <3

Lucas Medeiros • May 8 '23

Nice Primo!

thalesmengue • May 8 '23

Amazing article as always, thank you for sharing your knowledge Daniel!

Victor Ribeiro Boechat • May 8 '23

Nice article! Just learned a lot. Keep up the amazing work! :)

Panto! • May 8 '23

First time learning about Scylla, and looks quite powerful! Totally will be using for future projects.

SamucaDev • May 8 '23

Nice Article!

Xossy • May 8 '23 • Edited

Really good article. Congratulations Daniel.

DMayumi • May 9 '23

Nice article, congrats Dan!

Xossy • May 8 '23

great article!

Zima • May 10 '23

Interesting thank you for sharing. Your post was clear. Continue working on making them clearer. Enjoy !

Xossy • May 9 '23

Cara, sou um pouco iniciante quando se fala de modelagem de banco de dados e pude compreender completamente a explicação. Estou esperando ansiosamente pela próxima postagem.

Sther • May 8 '23

Excellent article! Congratulations primo! <3

nathanlucena • May 8 '23

good article, congratulations!

Humberto Lima de Alcantara Fonseca Filho • May 8 '23

Nice Article!

Lewis Cowles • Jul 1 '23

Both solutions are needed to avoid gaming likes.

You only need a single device to like and retract a like for a subjectId, so it could locally increment and decrement from a shared count.

Limits, separation of necessity and glamour and isolation of effect are key to engineering.

Anurag Vohra • Jun 9 '23

I made a small DB just to store counts. Its O(1) db, and it not an append only, and few lines of code. Tested only on linux.

github.com/anuragvohraec/CounterDB

Muzzammil Shaikh • Jun 5 '23

when you create socials.posts table add column post_analytics and its type JSON/TEXT based. after that let's suppose the user inserts data and you make
json structure like this
analytics=[{
"username": text
"likes": "like/dislike"
"likes_at": timeStamp()
}]
every time users like or dislike value update on it by the query.
when any user sees a first-time post and likes the post it counts as 1. If he/she dislikes the post then data will update in json related to the user