This is a spicy topic. It's spicy because I might say something that contradicts your cherished beliefs. Should you go "Multi-Region".
At times, you may not have a choice. Your boss or some non-technical team may have made the decision to go this way on your behalf. Personally, I have never seen an application where every single piece needed to be multi-region.
Before I go on, let me talk about a situation that I was once in.
Amazon experienced a blip in its availability, and a specific service in our region was affected. The impact it had on one of our core services was quite significant. While, in the grand scheme of things, such outages are to be expected, it invoked substantial panic from stakeholders during that time.
In the following post-mortem, many ideas were thrown around to "harden" our services. One way to do that was Multi-Region. Examples were scouted from various places to indicate services that were indeed multi-region and who managed to survive the apocalypse.
Architects have long documented various strategies for keeping services in multiple regions depending on the reliability levels desired by your system. You can have stand-by applications in different regions, ready for a cutover in case of outages. Alternatively, you can have a hot-hot setup for higher reliability.
In all of these cases, the heuristic seems to be that since you have distinct instances running in isolated locations, you may have spread out the risk. And in some cases, this logic is sound. However, it misses that, in most cases, AWS has already diversified this risk for us. AWS availability zones are generally located in different data centers that are physically isolated and parts of different electric grids. In the past, AWS has published lengthy procedures that it has put in place to ensure that an outage in one availability zone does not affect another. As a result, complete region outages are extremely rare.
It is easy to return to individual instances of the outages and point fingers at regional strategies. However, choosing regions should generally be a carefully thought-out trade-off, and a knee-jerk into multiple regions may cause more harm than good.
To begin with, most (not all) AWS services are deployed to a specific region, including many of the services designated as "High Availability." Consider the example of an ECS service. Individual containers may be in particular regions with a load balancer at a higher level. Such a setup already provides redundancy. Given the isolation of availability zones, such a redundant setup is hardened to absorb most outages. But I concede that some applications desire even higher levels of diversification, where spreading over availability zones isn't enough.
Many AWS services are designed to be contained in a region. There are many reasons for this, including reliability and regulatory reasons. For example, your local government may not allow data to be shipped out of its legal jurisdiction, and AWS is designed to respect such laws. Pushing back against this design choice of AWS comes at a cost in the form of complexity. You may need multiple VPCs instead of a single one, which you now have to peer. You may also need multiple instances of your code ready to run in multiple regions and thus need to add logic to keep these multiple regions up to date with the latest code.
Most importantly, you may need to add more security structures to protect your application in two or more regions instead of focusing on a single one. As counter-intuitive as it may sound, increasing the surface area of your application may expose you to more risks on the manual error side, thus reducing your actual reliability. You also add new systems that may have their reliability limits. This may include new firewalls, load balancers, cutover logic, VPC links for connecting networks across regions, etc.
With that said, as mentioned, sometimes going multi-region may be unavoidable. In such situations, the first point an architect should consider is how much of the architecture needs to be multi-region. As mentioned, the decision to go multi-region must be made on a granular level. Some application parts may be more critical than others and warrant a multi-region setup. In contrast, the other, less critical parts may remain in a single region.
The second trick may be to use AWS multi-region products such as Amazon Aurora Global Databases for RDS or AWS DynamoDB Global Tables, which abstract away some of the complexity from the end user as part of their shared responsibility model.
In general, when asked to go multi-region, always start with the "why" ?
Top comments (0)