DEV Community

Cover image for Database 101: How social media “likes” are stored in a database

Database 101: How social media “likes” are stored in a database

Daniel Reis on May 08, 2023

Did you ever think about how Instagram, Twitter, Facebook or any social media platforms track who liked your posts? Let's figure it out in this pos...
Collapse
 
ant_f_dev profile image
Anthony Fung

Sounds like an interesting approach.

Did you have to limit the number of likes from users? (e.g. each user is only allowed to 'like' once.)

Also, am I right in thinking that only having Increment and Decrement commands safeguards the data against concurrency issues (where one update might be 'lost' due to another overwriting it)?

Collapse
 
danielhe4rt profile image
Daniel Reis

Totally right.

Since it's just +/- the current number doesn't matter at all. The reason that the only type setted on the table needs to be counter.

Collapse
 
tracker1 profile image
Michael J. Ryan

Should be fine with concurrency... You'll still want a separate likes table for post-like and probably a user-like that references the posts liked by a user.

Scylla/Cassandra is not atomic across tables... And query performance isn't always great. So sometimes you'll want your own 1:n mappings both ways.

Collapse
 
ant_f_dev profile image
Anthony Fung

Ah, great idea - with the user-like table, you'd also be able to see how many times a user has 'liked' a post (if that was a requirement).

Thread Thread
 
tracker1 profile image
Michael J. Ryan

It's nice about finding the posts a user looked... From the user profile or activities

Collapse
 
jdgamble555 profile image
Jonathan Gamble

I'm wondering why not just recount the total likes on each like update, and update the count field accordingly. Not as performant as increments, but feels safer for accuracy in case of issues under the hood. No idea if this way is done, but makes sense to me.

Collapse
 
tracker1 profile image
Michael J. Ryan

This is discussed in TFA. You wind up creating a new record for each update to a counts field in a posts table.

Having the separate analytics table, you can have atomic counters that effectively do the same. Yes it's an n+1 request,. But with horizontal scaling and often faster than joins for a single view item.

Collapse
 
jdgamble555 profile image
Jonathan Gamble

I think you're misunderstanding. You wouldn't use joins in my case. It would save the value, it would just recalculate it instead of incrementing. After dealing with Firestore, I don't trust increments to be accurate.

Collapse
 
bobbyiliev profile image
Bobby Iliev • Edited

Great write up!

I think that this is a good example on where streaming databases like Materialize can really help. You can essentially create a live materialized view and the result will be kept up to date as the data comes.

In traditional DBs, data is stored in tables matching the structure of the writes (inserts, updates) and all the computation work happens on read queries (selects). A streaming database moves the work to the write side: Instead of just storing data on writes, a streaming database asks for the queries upfront (in the form of Materialized Views) and incrementally updates results as input data arrives.

Collapse
 
ashcript profile image
As Manjaka Josvah

If the database we use doesn't have counter type, is it possible to just use integer type instead? I see on your explanation that counter type is just an alias for integer type, right? Does it affect the performance of our application if we use integer type instead?

Collapse
 
danielhe4rt profile image
Daniel Reis

The idea behind the counter is to use always increment and decrement operation, eg: item + 1 || item - 1

Since it will not depend on which number stopped on the last row, you can deal easily with concurrent connections without loss any data.

Collapse
 
tracker1 profile image
Michael J. Ryan

If you're using a classic rdbms like postgres/MySQL/mssql then yes. There should be an atomic update to increment/decrement. Of course index on the likes table should work for count(postid) as well ..

Cassandra and ScyllaDB have differing behaviors for the benefit of horizontal scaling.

Collapse
 
philgoldenberg profile image
Phil Goldenberg

Thanks for the fun read! It makes me interested in checking out Scylla.

I do have a question that may not be related to the point of the article, but is there a reason you included the created_at field in the primary key of the posts table? I would think that the id field by itself would be enough for the PK.

Collapse
 
danielhe4rt profile image
Daniel Reis

At ScyllaDB we have a Partititon Key (like Primary) and Clustering Keys (Ordering Keys).

You can only do a query using PK or CK to guarantee that your query will be faster and both data should be present in every row.

Also I forgot to put user_id as clustering key for this example :p

Collapse
 
itsahsanmangal profile image
Ahsan Mangal 👨🏻‍💻

I'm puzzled why we don't simply add up the total amount of likes with each update and then edit the count column as needed. Though not as efficient as incrementing, this appears to be a more dependable means of ensuring accuracy, especially if there are unanticipated complications. I'm not sure if this strategy is already in use, but it sounds reasonable to me.

Collapse
 
quin profile image
quin

I wonder if it would be even better to only do that every... 30 or so likes. So basically, make a condition where if the count is divisible by 30, recalculate and overwrite the count. Best of both worlds? Or is the condition too expensive itself? At least it doesn't involve reading another table.

Collapse
 
pierrewahlberg profile image
Pierre Vahlberg

Your last model (posts, post_analytics, post_likes and possibly user_likes) reminds me of a presentor/aggregator architecture style from a presentation on event driven systems. On scale, i would probably lean there even

Thay would mean one service, a so called frontend or an api , would publish "userLikedPost" event when someone clicks like

A userAggregator would calculate user stats and store what post this user liked, while a postAggregator would do the same. Different UIs would then query these two aggregators through an api gateway or so.

This decoupling ties you less into finding one db that fits all needs, at the cost of just that - decoupling and maintaining more services and tech stack choices

But its interesting to think about 👍

Collapse
 
kbirgerdev profile image
Kirill Birger

Neat. I have never heard of this DB system before. I wonder if anyone has clever ideas for how to do this well in MySQL as it is not always practical to move to a different DB for a single use case, and MySQL doesn't have a good way to do this without table locking, I think 😱

Collapse
 
danielhe4rt profile image
Daniel Reis

If you have a huge system probably you will work with multiple databases.

Scylla is a database that can handle monstrously amount of data easily if you model in the right way.

We have a free cloud environment, give it a try!

Collapse
 
guto profile image
guto

Excellent article!

Collapse
 
yayaflc profile image
Yasmin Felício

Water bottle filled. Good article, primo!

Collapse
 
danielhe4rt profile image
Daniel Reis

Thanks! Hope that it helped you someway <3

Collapse
 
cherryramatis profile image
Cherry Ramatis

Nice to see the concept of signal applied to database modelling, thanks for the awesome knowledge

Collapse
 
loosername404 profile image
Thyago Vinicius Camilo Cunha • Edited

Awesome article!

I'm wondering how many databases (or even database types!) can be involved in a single social media feature...

Collapse
 
tracker1 profile image
Michael J. Ryan

Depends on your needs and the volume necessary

You can look at the likes of stack overflow as a website that scaled massively just with Microsoft SQL Server.

You could also use Redis with persistence if you don't mind a secondary data source or have different scaling needs.

Every 10x of users past the first 10k or so will mean differing approaches. At 1m or so you start needing something more like ScyllaDB.

Collapse
 
cviniciussdias profile image
Vinicius Dias

BRABO!

Collapse
 
jamesajayi profile image
James Ajayi

Quite informative. Welldone Daniel.

Collapse
 
nevodavid profile image
Nevo David

I am trying to understand the benefits of being atomic.

If multiple users run the update query, they will probably get a deadlock back.

Collapse
 
tzach profile image
Tzach Livyatan

Deadlock is impossible.
Counters implementation is base on CRDT, which is explained here:
scylladb.com/2017/04/04/counters/

Collapse
 
limaocode profile image
João Vitor

This article is great, but Vacilinho is better!
Send me one please 🥺

Collapse
 
flavin27 profile image
Flavio

Great article!

Collapse
 
angelomesquita profile image
Angelo Avila Mesquita

Excellent article!

Collapse
 
renanvidal profile image
Renan Vidal Rodrigues

Great!

Collapse
 
rudijr profile image
Rudi

Pretty good content

Collapse
 
devwander profile image
Wanderson José

Great article!

Collapse
 
brunovhk profile image
Bruno Vieira

Excellent article!

Collapse
 
izepon profile image
Jean Carlos Izepon

Nice Article, congrats primo!

Collapse
 
samuelscabral profile image
Samuel Souza Cabral

Good article!

Collapse
 
rafaelduarteamaral profile image
Rafael Duarte Amaral

Nice Article!

Collapse
 
omariosouto profile image
Mario Souto

Awesome article!!!

Collapse
 
hnrbs profile image
Henri Borges

Good job!

Collapse
 
danielhe4rt profile image
Daniel Reis

Thanks dude <3

Collapse
 
lucasmedeiros7 profile image
Lucas Medeiros

Nice Primo!

Collapse
 
thalesmengue profile image
thalesmengue

Amazing article as always, thank you for sharing your knowledge Daniel!

Collapse
 
vboechat profile image
Victor Ribeiro Boechat

Nice article! Just learned a lot. Keep up the amazing work! :)

Collapse
 
pantotone profile image
Panto!

First time learning about Scylla, and looks quite powerful! Totally will be using for future projects.

Collapse
 
samucadev profile image
SamucaDev

Nice Article!

Collapse
 
xossy profile image
Xossy • Edited

Really good article. Congratulations Daniel.

Collapse
 
dmayumi profile image
DMayumi

Nice article, congrats Dan!

Collapse
 
xossy profile image
Xossy

great article!

Collapse
 
zima567 profile image
Zima

Interesting thank you for sharing. Your post was clear. Continue working on making them clearer. Enjoy !

Collapse
 
xossy profile image
Xossy

Cara, sou um pouco iniciante quando se fala de modelagem de banco de dados e pude compreender completamente a explicação. Estou esperando ansiosamente pela próxima postagem.

Collapse
 
stherzada profile image
Sther

Excellent article! Congratulations primo! <3

Collapse
 
nathanlucena profile image
nathanlucena

good article, congratulations!

Collapse
 
hlaff profile image
Humberto Lima de Alcantara Fonseca Filho

Nice Article!

Collapse
 
lewiscowles1986 profile image
Lewis Cowles

Both solutions are needed to avoid gaming likes.

You only need a single device to like and retract a like for a subjectId, so it could locally increment and decrement from a shared count.

Limits, separation of necessity and glamour and isolation of effect are key to engineering.

Collapse
 
anuragvohraec profile image
Anurag Vohra

I made a small DB just to store counts. Its O(1) db, and it not an append only, and few lines of code. Tested only on linux.

github.com/anuragvohraec/CounterDB

Collapse
 
muzzammil194 profile image
Muzzammil Shaikh

when you create socials.posts table add column post_analytics and its type JSON/TEXT based. after that let's suppose the user inserts data and you make
json structure like this
analytics=[{
"username": text
"likes": "like/dislike"
"likes_at": timeStamp()
}]
every time users like or dislike value update on it by the query.
when any user sees a first-time post and likes the post it counts as 1. If he/she dislikes the post then data will update in json related to the user

Collapse
 
windyaaa profile image
Windya Madhushani

Great idea.

Collapse
 
awaisalwaisy profile image
Alwaisy al-waisy

I am testing how likes are stored.

Collapse
 
renato_zero6 profile image
Renato Rebouças

That's great article brother. 🚀

Collapse
 
ndrone profile image
Nicholas Drone

wonder what this would look like in a graph database.

Collapse
 
n0153 profile image
N0153128

possibly i'm not understanding the article, but how does this approach differs from the traditional Database Normalization?

Collapse
 
botjokerops profile image
Stijn van der Pouw

I would say I am still a junior, hence I won't feel stupid for asking, why did no one mentioned database triggers and function?