DEV Community

M Bellucci
M Bellucci

Posted on

Choosing a Storage Strategy

Intro

Technical decisions are all about trade-offs.
This post exposes some information that can be used as a guide when deciding which storage strategy is more convenient for each case.

Storage Options

  • In-memory
  • File-based
  • Norelational
    • Redis
    • Mongo
  • Relational
    • Postgres
    • Mysql

Decisions Factors

  • What data is being stored?
  • Is data constrained to meet a complex structure?
  • How quickly needs to be read/write?
  • How much data exists?
  • How data need to be queried?
  • How long and reliably the data need to be stored?
  • Is data accessed by many connections concurrently?

Tradeoffs

RDBMS

Pros:

  • Long-term
  • Complex-structure-support
  • Powerful-search

Cons:

  • Bad performance
  • Require administration tasks
  • Require ORM

In memory

Pros:

  • Good performance

Cons:

  • Less reliable

File-based

Pros:

  • No-dbms-maintainance

Cons:

  • Concurrency issues

MySQL vs PG

  • MySQL easier setup
  • PG supports recursive queries
  • PG supports specialized datatypes
  • PG supports standard authentication methods (LDAP, GSSAPI)
  • PG supports asynchronous replication

Not only SQL DBs

Pros:

  • Scalability
  • Simplicity

Redis

Pros:

  • In memory => Quick data manipulation
  • Support data structures
  • Support publish/suscribe

Cons:

  • No long term access
  • Space limited

Supported data structures

  • Hash-tables
  • Lists
  • Key/value pairs
  • Sets

Support publish/subscribe

Channels as data delivery mechanism

Channel --has_many--> Subscriber  
Channel --has_many--> Publisher  

Subscribers can Listen to Events on a Channel  
Publishers can Emit Events on a Channel 

Note that your code consumer can play both Subscriber and Publisher roles.

MongoDB

It is general-purpose.

Used for the same sorts of applications that you'd use an RDBMS for.

Cons:

  • No schema constraints

You can use libraries on top of MongoDB to provide validations, schema and searching capabilities.

Collection --has_many--> Document  
analogous to  
Table --has_many--> Row  

Collection's API provides (insert, update, find, remove)

Case Study

An exercise for the reader.

Let's suppose that we are going to build an application that supports group-conversations such as WhatsApp.

What data is being stored?
Our server needs to save messages for at least one month.
The client application can save messages locally.

Is data constrained to meet a complex structure?
Not too complex

User <------ Message -----> Group
User <------ can_send ----> Group
Restriction: A user cannot send messages to groups that he doesn't belong 

How quickly needs to be read/write?
We expect 3 million users sending messages at the same time.

Usability is negatively affected if the user receives the message with more than 2 seconds of delay.

How much data exists?
It must support millions of channels.

But we can assume that messages don't need to be saved forever.

We can delete the messages after a month.

Anyway, the participants will have their own copies of the messages locally.

How data needs to be queried?
We don't need sophisticated queries. Just fetching the messages for a channel that the user belongs to.

How long and reliably the data need to be stored?
Messages need to be stored for a month.

Is data accessed by many connections concurrently?
Messages can be concurrently read.

Concurrent writes over a message don't make sense.

With the previous scenario which storage strategy would you choose for client-app and server-app?

References

Top comments (0)