RAM PANDEY

Posted on Sep 16, 2023

MongoDB Replication Why and How?

#mongodb #database #systemdesign #learning

Hello, fellow developers! I trust you're doing splendidly. In this week's blog, we're delving into the intriguing world of MongoDB replication—the 'why' and 'how' behind it. Let's kick things off with the fundamental question: why replication?

The Why

Imagine your database as a bustling library of books, each book representing your valuable data. In a single-library setup, you have one copy of each book stored on a single shelf. While this might seem fine, it leaves your collection vulnerable. If, heaven forbid, that one shelf were to collapse or a book went missing, you'd lose precious knowledge.

Now, consider database replication as a strategy akin to having multiple libraries, each housing an identical collection of your books. This redundancy ensures that even if one library faces an issue—like a shelf collapsing (hardware failure) or a curious book thief (data corruption)—your knowledge remains intact in other libraries. This way, your data stays resilient, accessible, and safe.

The pitfalls of not considering replication become evident when you rely on a single library. In case of hardware failure, your entire collection is at risk, leading to data loss and costly downtime. Without replication, scaling to meet increasing demand becomes challenging, limiting your application's potential.

However, with replication, you enjoy an array of benefits. Improved data availability means your application stays up and running even in adverse conditions. Load balancing ensures efficient resource utilization, while failover mechanisms kick in when needed, ensuring uninterrupted service. Plus, you can easily scale your database to accommodate growing workloads.

In essence, replication is your fail-safe mechanism, your data's guardian angel. It's not just a best practice; it's a lifeline for safeguarding your precious data, avoiding potential pitfalls, and reaping the many benefits it brings to the table. Now that we understand the vital importance of replication for your database, let's delve into the 'how' — how to create a replication set in MongoDB.

How a Production level setup would look

Now that we've explored why replication is crucial, let's delve into how a production-level setup would appear.

In the illustration above, you can observe a primary node and two distinct secondary databases. Additionally, there's an arbiter situated on its own server, tasked with monitoring all the databases within the cluster. It continuously sends out small heartbeats to ensure all the machines remain online. In the event that the primary node experiences downtime, communication with it becomes impossible. In such a scenario, the arbiter steps in and assesses the remaining servers within your cluster. Subsequently, it will arbitrarily promote one of the secondary servers to assume the role of the new primary. This mechanism ensures the continued operation of your database even in the face of primary node failures.

For this demonstration, we will create a primary and a secondary node. However, if you intend to implement this replication setup in a production environment, I highly recommend following the complete setup outlined above.

How

Note: To follow along as we configure a replica set, it's recommended to download the latest version of MongoDB Community Edition. At the time of writing, the latest version is 7.0. For this blog, we'll use MacOS. While most commands are universal and work on both Linux and MacOS, Windows users may need to find equivalent commands.

Before diving into creating a replica set configuration, let's first grasp the basics of spinning up a MongoDB database. After all, a replica set comprises multiple databases. MongoDB relies on a configuration file called mongod.conf. The location of this file varies by operating system:

The location of mongod.conf for different OS are

OS	Path
Linux	/etc/mongod.conf
MacOS	/usr/local/etc/mongod.conf (on Intel processors), or /opt/homebrew/etc/mongod.conf (on Apple M1 processors)
Windows	< install_directory >\bin\mongod.cfg

To keep things organized and prevent unintentional changes to the default configuration, let's create new files and name it mongod.primary.conf and mongod.secondary.conf. This approach ensures that even if we make errors during the setup process, the default database remains unaffected.

To create the configuration files, we will begin by making a copy of the default configuration:



sudo cp mongod.conf mongod.primary.conf



sudo cp mongod.conf mongod.secondary.conf

Now, let's edit the configuration files to customize the settings. For each file, we will:

Set the destination for system logs.
Specify the data storage path (dbPath).
Define the port on which MongoDB should listen for requests.
Assign a name to the replication set.
Configure the MongoDB server to run as a background process.

mongod.primary.conf



systemLog:
  destination: file
  path: /usr/local/var/log/mongodb/mongo.primary.log
  logAppend: true
storage:
  dbPath: /usr/local/var/mongodb-primary
net:
  port: 27018
  bindIp: 127.0.0.1
replication:
  replSetName: demoRepl
processManagement:
  fork: true

mongod.secondary.conf



systemLog:
  destination: file
  path: /usr/local/var/log/mongodb/mongo.secondary.log
  logAppend: true
storage:
  dbPath: /usr/local/var/mongodb-secondary
net:
  port: 27019
  bindIp: 127.0.0.1
replication:
  replSetName: demoRepl
processManagement:
  fork: true

We have added the processManagement.fork option. This setting instructs the mongod server to run as a background process, allowing you to regain control of the terminal for running other commands.

You should also update the dbPath with the appropriate location for your operating system. Here are the default directories where MongoDB data is typically stored:

OS	Path
Linux	/var/lib/mongodb
MacOS	/usr/local/var/mongodb
Windows	C:\data\db

Please make sure to specify the dbPath according to your specific operating system.

The next crucial step is to create the necessary directories for storing the database data. Failure to do so may result in issues when starting the MongoDB server. Make sure to create these directories in advance to ensure a smooth setup process.

To create the necessary directories for the databases, you can use the mkdir command (for Linux and Mac). In this example, I'll create the directories in /usr/local/var/, but you should adjust the paths according to your preferences.



sudo mkdir /usr/local/var/mongodb-primary



sudo mkdir /usr/local/var/mongodb-secondary

With the directories in place, we're now ready to start up the databases.

To start the MongoDB server, you can use the mongod tool, which is included with the MongoDB installation. The command to start the server is as follows:



sudo mongod --config <config_file_name>

In this command, you specify the config file for the server to retrieve the necessary instance details.

Running this command will produce output similar to the following:

Notice that MongoDB forked as a child process. If you didn't specify the fork option, you would have lost control of the terminal. You can try running it both ways to see the difference in behavior.

If your setup was correct, you would be able to create both the instance without any issues.

Now lets get access to our databases using the mongosh tool which was shipped with the mongodb installation as well. We will write the commands in two different terminal tabs to get access to both of them at the same time



mongosh --port 27018



mongosh --port 27019

You should be able to get access and get an output similar to this

Now, if you try running any command in either of them, you will encounter an error like this:

The reason for this error is that, since we have specified the replication set name, the MongoDB instance will only be able to access the database after syncing with the other databases in the replication set. However, we haven't initialized the connection yet, so it remains in an unknown state.

To resolve this, we will start by using the primary database tab and running the following command:



rs.initiate()

The command will produce an output like this:

In this step, we haven't provided any specific configuration for the replica set, which is why you'll see an info message.

Now, let's add the secondary database to our replica set using the following command:



rs.add("<servername>:<port>")



rs.add("127.0.0.1:27019")

Running this command should yield an output like this:

With this, our replication set is now complete!

To check the status or obtain details about your replica set, you can use the rs.status() command.

Let's create a new database in the primary and verify if the changes are replicating in the secondary node or not.

Primary Database View:

Secondary Database View:

As you can see, without any additional effort, the database changes from the primary have been successfully replicated in the secondary. With these steps, we have configured our very own replication set.

I hope you found this tutorial informative and valuable. Please feel free to share any feedback or suggestions with me. Thank you for reading, and until next time!

DEV Community

MongoDB Replication Why and How?

The Why

How a Production level setup would look

How

Top comments (0)

Read next

Database Indexing: A Comprehensive Guide for All Levels

System Design 08 - Rate Limiting: The Bouncer That Keeps Your API Calm

Month 1: The Kickoff – Building the Foundation

Never have unhandled Errors in TypeScript