HoangNg

Posted on Apr 11, 2024 • Edited on Jul 17, 2024

Fundamentals of MongoDB (part 1) - architecture

#webdev #database #mongodb #nosql

What is MongoDB

MongoDB is a document-oriented NoSQL database that stores data in flexible JSON-like documents. This flexibility, along with its scalability and query capabilities, makes it a popular choice for modern applications dealing with diverse and rapidly changing data.

Relational Database Management System vs MongoDB

It would be highly transferable to MongoDB if you have already worked with a relational database management system (RDBMS) before. The diagram below presents concepts often used in RDBMS and their counterparts in MongoDB.

Deployment Architecture

There are three options for deploying a MongoDB: standalone, replication, and sharding architectures.

1) Standalone architecture

There's only one server (i.e., standalone).

Advantages:
Simplicity, lower resource requirements and faster startup time.

Disadvantages:
No high availability, limited scalability and no fault tolerance. More specifically, if the server hosting the database malfunctions or experiences data corruption, you could lose all your data. There's no built-in mechanism for data redundancy or recovery.

2) Replication architecture

Data is replicated from the primary server across multiple secondary servers. If the primary server in the set fails, another member (secondary) can be automatically elected and promoted to become the primary, minimizing downtime and ensuring data remains accessible.

Advantages:
High availability, improved read scalability (i.e., data can be read from secondary servers, reducing bottleneck issue), disaster recovery (i.e., data can be restored from a surviving secondary server as mentioned above).

Disadvantages:
Complexity and higher hardware costs when compared to the standalone architecture.

3) Sharding architecture

Sharding allows for horizontal scaling of the database by adding more shard servers. This is ideal for handling massive datasets and high write/read throughput that a single server can't manage.

Advantages:
Horizontal scalability, improved performance for specific queries (i.e., only the relevant shard(s) need to be accessed for the query), and flexibility (i.e., independently scale different parts of the database).

Disadvantages:
Increased complexity, potential performance overhead and uneven data distribution and bottlenecks.

What I see a MongoDB from my previous developer's view

I admit that, previously, I only focused on making a correct configuration for the data flowing to the database (i.e., fetching the right data for the right collection, field, and document) without understanding the database architecture, as depicted below.

However, I believe it would be very beneficial to have a solid understanding of databases. Then, I could optimize my queries. Therefore, I'm learning more about databases.

What happens when querying

We first need to know that our query operations happen in the database memory, and users (i.e., developers) often interact with the memory. If we consider our database as a car, the database memory is the engine. The storage engine applied for the database memory and the design of the data distribution in the physical storage strongly influences our database's performance. WiredTiger and In-Memory are two commonly used storage engines.

To check the storage engine, type the following command.

db.serverStatus().storageEngine

The following illustration presents a typical architecture of a WiredTiger storage engine. It provides a fundamental understanding of a WiredTiger database architecture.

Thank you for reading this far
Hoang
P/S: In part 2, I will write an example of optimizing a query in MongoDB.

Top comments (1)

Karim Abdallah • Apr 12 '24

Nice Article, Keep Going bro.