Database replication is the process of copying data from a database in one server to a database in another one. This protects your information in case of a disaster or hardware failure and improves the reliability and accessibility of the information. Replication provides high availability and redundancy, crucial topics when you manage a huge amount of data.
In MongoDB, this process is done through a replica set which in simple words, it's a group of mongod processes to keep the same data across different servers. A replica set must have three nodes at least. One of them must be the primary and the rest secondary ones. A replication structure can have up to 50 nodes and 7 voting members maximum, more nodes might decrease the performance. But, I think you would not need this quantity of nodes unless you have a global requirement. That is another story.
There are three kinds of MongoDB nodes
-
Primary
The primary is the member in the replica set that receives write and read operations. But, read operations can be pointed out to secondary nodes changing the configuration at the moment to perform the query. Besides the replica set can have only one primary node at most.
-
Secondary
The secondary is the node where the data is replicated to maintain a copy. A replica set can have one or more secondary nodes. The clients can not write data to secondary ones, only they can read from them. In certain cases, a secondary node might become a primary node. I’ll talk a little more below. The secondary nodes replicate the primary's oplog(operations log) and apply the operations to their data sets asynchronously. Since the replication is asynchronous, the function of the database can continue if one or more of the members fail.
-
Arbiter
The arbiter node does not have a copy of the data set and can not become a primary. However, an arbiter participates in elections for primary.
If the primary node goes down, one of the secondary nodes will become the primary, this through an election, until the original primary node goes online back again. Also, if both secondary nodes are unavailable, the primary node would become secondary and the database will be unreachable. To keep in mind, the primary node receives all write operations and the majority of the read operations. Therefore, the replica set must have a primary node active.
Member election
An election is used to determine which member will become primary. An election is triggered to respond to a variety of events, such as:
- Initiating a replicate set
- Adding a new node to the replica set
- Performing maintenance tasks
The average time before a cluster elects a new primary should not typically exceed 12 seconds. The election algorithm will attempt to have the secondary with the highest priority available. On the other hand, members with a priority value of 0 can not become primary and do not seek election.
For durability, the write operations have a mechanism to replicate the data in a certain number of nodes and give feedback to the client. This mechanism is called, "Write concern" which consists of a number of data-bearing members that must acknowledge a Write concern before the operation returns as successful. By default, the replica sets have a value of 1 as write concern, therefore only the primary must acknowledge the write before returning write concern acknowledgment. Also, you can increase the number of members required to acknowledge the write operation, as many you want but if the number is high, it increases the latency as the client must wait until it receives the acknowledgment from the number of members specified. On the other hand, you can set a write concern of "majority" which calculates more than half of members to receive its acknowledgment.
What is write concern?
Write concern describes the level of acknowledgment requested from MongoDB for write operations to a replica set. The specified number must be lesser than the number of member nodes. Also, you can set the write concern as “majority” what does it’s to request acknowledgment that the write operations have propagated to the calculated majority of the members. It means, in a cluster of three data-bearing nodes the acknowledgment will be sent to the client after two of them receive the data. By default the primary and one of the secondaries.
What is read preference?
For the read operations, you can specify a read preference that describes how the database routes the query to members of the replica set. By default, the primary node receives the read operation but the clients can specify a read preference to send the read operations to secondary nodes. The following are the options for the read preference.
-
primary
All read operations come from the primary node
-
primaryPreferred
Most read operations come from the primary node but if this is unavailable the data would come from the secondary nodes
-
secondary
All read operations come from the secondary nodes
-
secondaryPreferred
Most read operations come from the secondary nodes but if none of these are available the data comes from the primary node
-
nearest
The result of reading operations may come from any of the members of the replica set, it doesn't matter if this is primary or secondary.
To sum up, the process of replicating data across different servers is a good option for high availability and durability. In simple words, replication is the process of creating redundant data to safeguard it. In addition, MongoDB brings this concept to its core, therefore it already has the tools to use it easily.
Thanks for reading!
Top comments (1)
Does reading only from secondary will help in performance?
And for the availability of data, mongo continuously need to check primary if any new data is added and then replicate it to secondaries which I think will impact the performance?