Hello, fellow developers! I trust you're doing splendidly. In this week's blog, we're delving into the intriguing world of MongoDB replication—the 'why' and 'how' behind it. Let's kick things off with the fundamental question: why replication?
The Why
Imagine your database as a bustling library of books, each book representing your valuable data. In a single-library setup, you have one copy of each book stored on a single shelf. While this might seem fine, it leaves your collection vulnerable. If, heaven forbid, that one shelf were to collapse or a book went missing, you'd lose precious knowledge.
Now, consider database replication as a strategy akin to having multiple libraries, each housing an identical collection of your books. This redundancy ensures that even if one library faces an issue—like a shelf collapsing (hardware failure) or a curious book thief (data corruption)—your knowledge remains intact in other libraries. This way, your data stays resilient, accessible, and safe.
The pitfalls of not considering replication become evident when you rely on a single library. In case of hardware failure, your entire collection is at risk, leading to data loss and costly downtime. Without replication, scaling to meet increasing demand becomes challenging, limiting your application's potential.
However, with replication, you enjoy an array of benefits. Improved data availability means your application stays up and running even in adverse conditions. Load balancing ensures efficient resource utilization, while failover mechanisms kick in when needed, ensuring uninterrupted service. Plus, you can easily scale your database to accommodate growing workloads.
In essence, replication is your fail-safe mechanism, your data's guardian angel. It's not just a best practice; it's a lifeline for safeguarding your precious data, avoiding potential pitfalls, and reaping the many benefits it brings to the table. Now that we understand the vital importance of replication for your database, let's delve into the 'how' — how to create a replication set in MongoDB.
How a Production level setup would look
Now that we've explored why replication is crucial, let's delve into how a production-level setup would appear.
In the illustration above, you can observe a primary node and two distinct secondary databases. Additionally, there's an arbiter situated on its own server, tasked with monitoring all the databases within the cluster. It continuously sends out small heartbeats to ensure all the machines remain online. In the event that the primary node experiences downtime, communication with it becomes impossible. In such a scenario, the arbiter steps in and assesses the remaining servers within your cluster. Subsequently, it will arbitrarily promote one of the secondary servers to assume the role of the new primary. This mechanism ensures the continued operation of your database even in the face of primary node failures.
For this demonstration, we will create a primary and a secondary node. However, if you intend to implement this replication setup in a production environment, I highly recommend following the complete setup outlined above.
How
Note: To follow along as we configure a replica set, it's recommended to download the latest version of MongoDB Community Edition. At the time of writing, the latest version is 7.0. For this blog, we'll use MacOS. While most commands are universal and work on both Linux and MacOS, Windows users may need to find equivalent commands.
Before diving into creating a replica set configuration, let's first grasp the basics of spinning up a MongoDB database. After all, a replica set comprises multiple databases. MongoDB relies on a configuration file called mongod.conf
. The location of this file varies by operating system:
The location of mongod.conf
for different OS are
OS | Path |
---|---|
Linux | /etc/mongod.conf |
MacOS | /usr/local/etc/mongod.conf (on Intel processors), or /opt/homebrew/etc/mongod.conf (on Apple M1 processors) |
Windows | < install_directory >\bin\mongod.cfg |
To keep things organized and prevent unintentional changes to the default configuration, let's create new files and name it mongod.primary.conf
and mongod.secondary.conf
. This approach ensures that even if we make errors during the setup process, the default database remains unaffected.
To create the configuration files, we will begin by making a copy of the default configuration:
sudo cp mongod.conf mongod.primary.conf
sudo cp mongod.conf mongod.secondary.conf
Now, let's edit the configuration files to customize the settings. For each file, we will:
- Set the destination for system logs.
- Specify the data storage path (dbPath).
- Define the port on which MongoDB should listen for requests.
- Assign a name to the replication set.
- Configure the MongoDB server to run as a background process.
mongod.primary.conf
systemLog:
destination: file
path: /usr/local/var/log/mongodb/mongo.primary.log
logAppend: true
storage:
dbPath: /usr/local/var/mongodb-primary
net:
port: 27018
bindIp: 127.0.0.1
replication:
replSetName: demoRepl
processManagement:
fork: true
mongod.secondary.conf
systemLog:
destination: file
path: /usr/local/var/log/mongodb/mongo.secondary.log
logAppend: true
storage:
dbPath: /usr/local/var/mongodb-secondary
net:
port: 27019
bindIp: 127.0.0.1
replication:
replSetName: demoRepl
processManagement:
fork: true
We have added the processManagement.fork
option. This setting instructs the mongod server to run as a background process, allowing you to regain control of the terminal for running other commands.
You should also update the dbPath with the appropriate location for your operating system. Here are the default directories where MongoDB data is typically stored:
OS | Path |
---|---|
Linux | /var/lib/mongodb |
MacOS | /usr/local/var/mongodb |
Windows | C:\data\db |
Please make sure to specify the dbPath according to your specific operating system.
The next crucial step is to create the necessary directories for storing the database data. Failure to do so may result in issues when starting the MongoDB server. Make sure to create these directories in advance to ensure a smooth setup process.
To create the necessary directories for the databases, you can use the mkdir command (for Linux and Mac). In this example, I'll create the directories in /usr/local/var/
, but you should adjust the paths according to your preferences.
sudo mkdir /usr/local/var/mongodb-primary
sudo mkdir /usr/local/var/mongodb-secondary
With the directories in place, we're now ready to start up the databases.
To start the MongoDB server, you can use the mongod
tool, which is included with the MongoDB installation. The command to start the server is as follows:
sudo mongod --config <config_file_name>
In this command, you specify the config file for the server to retrieve the necessary instance details.
Running this command will produce output similar to the following:
Notice that MongoDB forked as a child process. If you didn't specify the fork option, you would have lost control of the terminal. You can try running it both ways to see the difference in behavior.
If your setup was correct, you would be able to create both the instance without any issues.
Now lets get access to our databases using the mongosh
tool which was shipped with the mongodb installation as well. We will write the commands in two different terminal tabs to get access to both of them at the same time
mongosh --port 27018
mongosh --port 27019
You should be able to get access and get an output similar to this
Now, if you try running any command in either of them, you will encounter an error like this:
The reason for this error is that, since we have specified the replication set name, the MongoDB instance will only be able to access the database after syncing with the other databases in the replication set. However, we haven't initialized the connection yet, so it remains in an unknown state.
To resolve this, we will start by using the primary database tab and running the following command:
rs.initiate()
The command will produce an output like this:
In this step, we haven't provided any specific configuration for the replica set, which is why you'll see an info message.
Now, let's add the secondary database to our replica set using the following command:
rs.add("<servername>:<port>")
rs.add("127.0.0.1:27019")
Running this command should yield an output like this:
With this, our replication set is now complete!
To check the status or obtain details about your replica set, you can use the rs.status()
command.
Let's create a new database in the primary and verify if the changes are replicating in the secondary node or not.
Primary Database View:
Secondary Database View:
As you can see, without any additional effort, the database changes from the primary have been successfully replicated in the secondary. With these steps, we have configured our very own replication set.
I hope you found this tutorial informative and valuable. Please feel free to share any feedback or suggestions with me. Thank you for reading, and until next time!
Top comments (0)