Discussion on: 5 Things I Learned from The DynamoDB Book

View post

For no. 5, DynamoDB is not the right solution for storing relationships, it's just far too cost-inefficient as you scale up, especially as you usually fetch ALL of someone's followers every time (whenever they post, retweet, etc.). For social networks, 90% of your users won't have many followers, but there's always a few that has 1000s, 10s of 1000s or even millions of followers. That is as true for Twitter as for early stage social networks. At the social network I worked at, we only had about 1m users and at that point, we already had users with over 50,000 followers.

If you have to use DynamoDB, you're better off putting IDs of users they follow into a list to maximize the utility of those read units - if you use KSUID as IDs then 4kb read unit can return ~150 followers with a single get request. And when you approach the 400kb item size limit, split out the list into multiple items and use the SK to store some sort of hashing range. But this only gets you so far, at some point, you just have to use a different data store for those power users with huge number of followers to improve cost efficiency of those read patterns.

swyx • Apr 12 '20 • Edited

right, thank you! I recall in Martin Kleppmann's Designing Data Intensive Applications he discussed a hybrid approach for Twitter, where it is "pull on demand" for power users, and "fan-out" for normal users. So it's a different access pattern.

I don't know if that means abandoning DDB/NoSQL entirely - since if you tried to do this entirely in SQL, you also have a different set of issues! After all, wasn't Twitter partly responsible for the rise of NoSQL in the first place?