DEV Community

Cover image for What is Geo Redundancy?
DavidCockerill for HarperDB

Posted on • Updated on

What is Geo Redundancy?

Let's talk geo-redundancy! Ever heard of it? There are many benefits to redundancy, and it is important for IT teams and general organizations to utilize/implement redundancy methodologies as a safety net. Having these systems in place can improve reliability and availability while reducing downtime. This is because there are certain variables that businesses rely on that are not guaranteed and therefore might need a backup plan. For example; Internet, hardware, power, and/or data storage. Many of these variables can affect a large area. They may not happen often, but one power failure that takes your application offline is one too many. Geo redundancy can reduce or remove these risks, and better prepare organizations to handle disaster recovery.

As the name implies, geo-redundancy refers to the practice of providing redundancy (extra or duplicates) through physically separating infrastructure across multiple geographical locations, and because I work for a database company, I will be discussing it in relation to databases.

Geo redundancy is a powerful (and somewhat magical) force that ensures high availability and disaster recovery. It will replicate your data and store it in other databases located in separate physical locations. It does this so that if a location fails or simply needs to be taken offline, your other location, which also stores your data, will not be affected.

Geo redundancy is super easy to implement in HarperDB through its clustering engine, which replicates data between instances of HarperDB using a highly performant, bi-directional pub/sub-model. The first step is to install HarperDB in multiple geographical locations. A single instance/installation of HarperDB constitutes a node. Once HarperDB node subscriptions are configured via the API they establish a WebSocket connection between each other to replicate data. When two or more nodes are subscribed to each other you have a cluster. Depending on how the nodes were subscribed to each other, a transaction on one node can automatically be published to another. And there you have it, the start of geo-redundancy!

Alt Text

In this case, HarperDB provides a “backup plan” to ensure that your organization is prepared for even the most unpredictable outcomes. The term redundancy means “too much” or “more than is needed.” This of course is not necessary or helpful in all situations of life or business, but when it comes to the world of data, IT, and engineering, you can bet your bottom dollar that you will regret not having a redundancy plan in place. If you fear that implementing redundancy is a waste of time, just think, what would happen if you lost your data? HarperDB enables you to implement geo-redundancy in a simple and cost-effective manner, and you can now sleep easy knowing your data is safely stored across the globe.

Top comments (5)

davidcockerill profile image
DavidCockerill • Edited

Hi Pavel, sorry about that, we've updated the link to link to a more insightful document:

Pub/sub means a node can publish and/or subscribe to another node. If node A publishes to node B, any changes on node A will be published to node B. If changes happen on node B they will not be reflected on node A. However, if node A is subscribed to node B, changes on node B will be reflected on node A.

I hope that wasn’t too confusing of an explanation.

davidcockerill profile image

Pavel, in many cases you're right, two-way synchronization is ideal. This is easily done by enabling both publish and subscribe on each table, that will result in complete two-way replication.

However, we think it's important to give users granular control of their replication for cases where one-way replication may be ideal. For example, if the sole purpose of node B was to provide redundancy for node A and you weren’t making any create/update operations on node B that needed to be reflected on node A, the communication can be one way (A -> B).

Another case where this may be valuable is in sensor data collection. You may have many nodes collecting sensor data that only care about that sensor, but the data could be replicated up to a primary reporting node that would have values from all sensors. In this case, one way replication is important and queries on each node would result in different data sets.