Best Practices Working with Billion-row Tables in Databases

#sql #database #architecture #webdev

This whole post is about a video from Hussein Nasser I saw from youtube. This is just the summarise version of the video along with my key takeaways from the video

Learning:

The discussion is on

How to handle data in billion row based table
What kind of approach can be taken
How to redesign the table to handle 2 billion rows based table

Summary:

Here the discussion is being started with how a twitter follower table has been designed. A simple approach, make a table with which person follows whom. A 2-3 columns based rows. But it will be a huge table in the long term in sense if we use this for twitter. So what can be done

Do normal query without the concept of indexing. Just brute forcing the data without the concept of anything. Do multi threading, multi processing and find the data from the table using lots of machines (map reduce)
Use indexing on the table and find the data from the table using the indexed data.
Now you have billions of data so indexing is huge, to search that use database partitioning in the same disk. Use pair partitioning
Now to optimize more use sharding in the system(multiple host). But it adds more complexity in the system. The client needs to be aware of the shard info before querying, then needs to find the proper partition for the query, then make an actual query. Which makes another layer of logics along with business logics.
Another way is to redesign the system like in the profile table add 2 more rows to hold db columns like follower count, followers(in json). In that way a profile holds all information about followers of a profile. The problem might generate how to write/edit this data. But that is another kind of system design like querying, CQRS, event based solution. It solves current issues.

DEV Community

Best Practices Working with Billion-row Tables in Databases

Learning:

Summary:

Top comments (0)

Read next

Made a whimsical theme toggle, with CSS and emojis

SvelteKit VS Astro. laidback side by side

Advanced MySQL Techniques for Developers: Boost Performance, Scalability, and Flexibility

Essential Use Cases to Master API Development and Integration