M Bellucci

Posted on Feb 25, 2020

Choosing a Storage Strategy

#systemdesign #storage

Intro

Technical decisions are all about trade-offs.
This post exposes some information that can be used as a guide when deciding which storage strategy is more convenient for each case.

Storage Options

In-memory
File-based
Norelational
- Redis
- Mongo
Relational
- Postgres
- Mysql

Decisions Factors

What data is being stored?
Is data constrained to meet a complex structure?
How quickly needs to be read/write?
How much data exists?
How data need to be queried?
How long and reliably the data need to be stored?
Is data accessed by many connections concurrently?

Tradeoffs

RDBMS

Pros:

Long-term
Complex-structure-support
Powerful-search

Cons:

Bad performance
Require administration tasks
Require ORM

In memory

Pros:

Good performance

Cons:

Less reliable

File-based

Pros:

No-dbms-maintainance

Cons:

Concurrency issues

MySQL vs PG

MySQL easier setup
PG supports recursive queries
PG supports specialized datatypes
PG supports standard authentication methods (LDAP, GSSAPI)
PG supports asynchronous replication

Not only SQL DBs

Pros:

Scalability
Simplicity

Redis

Pros:

In memory => Quick data manipulation
Support data structures
Support publish/suscribe

Cons:

No long term access
Space limited

Supported data structures

Hash-tables
Lists
Key/value pairs
Sets

Support publish/subscribe

Channels as data delivery mechanism

Channel --has_many--> Subscriber  
Channel --has_many--> Publisher  

Subscribers can Listen to Events on a Channel  
Publishers can Emit Events on a Channel

Note that your code consumer can play both Subscriber and Publisher roles.

MongoDB

It is general-purpose.

Used for the same sorts of applications that you'd use an RDBMS for.

Cons:

No schema constraints

You can use libraries on top of MongoDB to provide validations, schema and searching capabilities.

Collection --has_many--> Document  
analogous to  
Table --has_many--> Row

Collection's API provides (insert, update, find, remove)

Case Study

An exercise for the reader.

Let's suppose that we are going to build an application that supports group-conversations such as WhatsApp.

What data is being stored?
Our server needs to save messages for at least one month.
The client application can save messages locally.

Is data constrained to meet a complex structure?
Not too complex

User <------ Message -----> Group
User <------ can_send ----> Group
Restriction: A user cannot send messages to groups that he doesn't belong

How quickly needs to be read/write?
We expect 3 million users sending messages at the same time.

Usability is negatively affected if the user receives the message with more than 2 seconds of delay.

How much data exists?
It must support millions of channels.

But we can assume that messages don't need to be saved forever.

We can delete the messages after a month.

Anyway, the participants will have their own copies of the messages locally.

How data needs to be queried?
We don't need sophisticated queries. Just fetching the messages for a channel that the user belongs to.

How long and reliably the data need to be stored?
Messages need to be stored for a month.

Is data accessed by many connections concurrently?
Messages can be concurrently read.

Concurrent writes over a message don't make sense.

With the previous scenario which storage strategy would you choose for client-app and server-app?

References

Node.js in Action

DEV Community

Choosing a Storage Strategy

Intro

Storage Options

Decisions Factors

Tradeoffs

RDBMS

In memory

File-based

MySQL vs PG

Not only SQL DBs

Redis

Support publish/subscribe

MongoDB

Case Study

References

Top comments (0)

Read next

Automating Website Deployments with AWS CodePipeline and S3 No Upload

8 tips to learn GenAI in 2025

Revolutionize GitHub Issue Management with KaibanJS

Amazon Lightsail: Instances, Access, and Best Practices