Discussion on: Explain MongoDB Like I'm Five

View post

Assuming you're a 5 year old that understands JSON, files, folders and RDBMS:

Conceptualise the basics as storing JSON files in a folder. A single JSON file can contain anything you want. That is one "document" in Mongo-speak, or one "row" in RDBMS-speak. The main difference you see right away is that an RDBMS row is typically only a flat key-value association (though some databases support types like arrays and JSON for columns), while a JSON document is an arbitrarily complex object.

Such JSON documents are lumped into one "collection"; think of a folder. The collection can be entirely heterogeneous, there need not be any fixed schema a document must adhere to. The analogue in RDBMS is a table, though again the lack of schema makes this comparison unsuitable and a folder is the better metaphor. Such collections are part of one database, and a single Mongo server can contain many databases.

The advantage Mongo brings over a simple folder full of JSON files is that it adds database capabilities to it. That includes a query language, an indexing system, richer types than plain JSON offers, replication across machines etc. The query language is necessarily very different from SQL, since the structure of the data you're querying is very different. As you might imagine it's also harder to cross-reference (join) data across heterogenous collections of JSON documents as compared to fixed-schema relational data, so a lot of SQL JOIN queries you might be used to need a different approach. Due to the "eventually consistent" replication model, transactional operations are also… different, largely non-existent.

Overall, you need to rethink a lot of concepts you might have learned vis-à-vis database normalisation. In RDBMSs you typically spread a logical record (e.g. a user record) across several tables with one-to-many or many-to-many relationships, which necessitates transactional updates to keep the data consistent. In Mongo you'd rather store all that in a single complex document in a single collection. When working with such data, you need to keep in mind that data across different collections may not be consistent at any one point in time, so there's not typically a lot of cross-referencing going on in a Mongo database. Mongo data storage is best suited for many small individual data blobs, not so much for highly correlated data with a strong need for internal consistency.