Intro
Technical decisions are all about trade-offs.
This post exposes some information that can be used as a guide when deciding which storage strategy is more convenient for each case.
Storage Options
- In-memory
- File-based
- Norelational
- Redis
- Mongo
- Relational
- Postgres
- Mysql
Decisions Factors
- What data is being stored?
- Is data constrained to meet a complex structure?
- How quickly needs to be read/write?
- How much data exists?
- How data need to be queried?
- How long and reliably the data need to be stored?
- Is data accessed by many connections concurrently?
Tradeoffs
RDBMS
Pros:
- Long-term
- Complex-structure-support
- Powerful-search
Cons:
- Bad performance
- Require administration tasks
- Require ORM
In memory
Pros:
- Good performance
Cons:
- Less reliable
File-based
Pros:
- No-dbms-maintainance
Cons:
- Concurrency issues
MySQL vs PG
- MySQL easier setup
- PG supports recursive queries
- PG supports specialized datatypes
- PG supports standard authentication methods (LDAP, GSSAPI)
- PG supports asynchronous replication
Not only SQL DBs
Pros:
- Scalability
- Simplicity
Redis
Pros:
- In memory => Quick data manipulation
- Support data structures
- Support publish/suscribe
Cons:
- No long term access
- Space limited
Supported data structures
- Hash-tables
- Lists
- Key/value pairs
- Sets
Support publish/subscribe
Channels as data delivery mechanism
Channel --has_many--> Subscriber
Channel --has_many--> Publisher
Subscribers can Listen to Events on a Channel
Publishers can Emit Events on a Channel
Note that your code consumer can play both Subscriber and Publisher roles.
MongoDB
It is general-purpose.
Used for the same sorts of applications that you'd use an RDBMS for.
Cons:
- No schema constraints
You can use libraries on top of MongoDB to provide validations, schema and searching capabilities.
Collection --has_many--> Document
analogous to
Table --has_many--> Row
Collection's API provides (insert, update, find, remove)
Case Study
An exercise for the reader.
Let's suppose that we are going to build an application that supports group-conversations such as WhatsApp.
What data is being stored?
Our server needs to save messages for at least one month.
The client application can save messages locally.
Is data constrained to meet a complex structure?
Not too complex
User <------ Message -----> Group
User <------ can_send ----> Group
Restriction: A user cannot send messages to groups that he doesn't belong
How quickly needs to be read/write?
We expect 3 million users sending messages at the same time.
Usability is negatively affected if the user receives the message with more than 2 seconds of delay.
How much data exists?
It must support millions of channels.
But we can assume that messages don't need to be saved forever.
We can delete the messages after a month.
Anyway, the participants will have their own copies of the messages locally.
How data needs to be queried?
We don't need sophisticated queries. Just fetching the messages for a channel that the user belongs to.
How long and reliably the data need to be stored?
Messages need to be stored for a month.
Is data accessed by many connections concurrently?
Messages can be concurrently read.
Concurrent writes over a message don't make sense.
With the previous scenario which storage strategy would you choose for client-app and server-app?
Top comments (0)