Technical decisions are all about trade-offs.
This post exposes some information that can be used as a guide when deciding which storage strategy is more convenient for each case.
- What data is being stored?
- Is data constrained to meet a complex structure?
- How quickly needs to be read/write?
- How much data exists?
- How data need to be queried?
- How long and reliably the data need to be stored?
- Is data accessed by many connections concurrently?
- Bad performance
- Require administration tasks
- Require ORM
- Good performance
- Less reliable
- Concurrency issues
- MySQL easier setup
- PG supports recursive queries
- PG supports specialized datatypes
- PG supports standard authentication methods (LDAP, GSSAPI)
- PG supports asynchronous replication
- In memory => Quick data manipulation
- Support data structures
- Support publish/suscribe
- No long term access
- Space limited
Supported data structures
- Key/value pairs
Channels as data delivery mechanism
Channel --has_many--> Subscriber Channel --has_many--> Publisher Subscribers can Listen to Events on a Channel Publishers can Emit Events on a Channel
Note that your code consumer can play both Subscriber and Publisher roles.
It is general-purpose.
Used for the same sorts of applications that you'd use an RDBMS for.
- No schema constraints
You can use libraries on top of MongoDB to provide validations, schema and searching capabilities.
Collection --has_many--> Document analogous to Table --has_many--> Row
Collection's API provides (insert, update, find, remove)
An exercise for the reader.
Let's suppose that we are going to build an application that supports group-conversations such as WhatsApp.
What data is being stored?
Our server needs to save messages for at least one month.
The client application can save messages locally.
Is data constrained to meet a complex structure?
Not too complex
User <------ Message -----> Group User <------ can_send ----> Group Restriction: A user cannot send messages to groups that he doesn't belong
How quickly needs to be read/write?
We expect 3 million users sending messages at the same time.
Usability is negatively affected if the user receives the message with more than 2 seconds of delay.
How much data exists?
It must support millions of channels.
But we can assume that messages don't need to be saved forever.
We can delete the messages after a month.
Anyway, the participants will have their own copies of the messages locally.
How data needs to be queried?
We don't need sophisticated queries. Just fetching the messages for a channel that the user belongs to.
How long and reliably the data need to be stored?
Messages need to be stored for a month.
Is data accessed by many connections concurrently?
Messages can be concurrently read.
Concurrent writes over a message don't make sense.
With the previous scenario which storage strategy would you choose for client-app and server-app?