DEV Community

axurcio
axurcio

Posted on • Originally published at insight-services-apac.github.io on

A Guide to Azure Site Recovery - Part 1

What is Azure Site Recovery?

Azure Site Recovery is a Disaster Recovery as a Service offering from Azure which contributes the Business Continuity and DR strategy by replicating the IaaS workloads between regions in Azure, Onpremises physical servers and virtual machines to Azure.

In this blog post, I will selectively cover the Azure to Azure Site Recovery option and discuss the various under the hood components that make up an Azure Site recovery. I will also explain the different phases of setting up a Azure Site Recovery for Disaster Recovery.

Why Azure Site Recovery in a PaaS and Containers world?

In a world fast moving towards Containers and PaaS service if anyone can wonder why would we need Azure Site Recovery here is some supporting facts. Yes, many customers and product teams move away from IaaS style workloads to PaaS services and Containers in form of Azure Kubernetes. However we still have a lot of customers and businesses who are beginning their Cloud journey having a lot of applications and services running in Virtual machines in data centres.

Cloud Enablement for these Onpremises Virtual machines quite often starts with migrations i.e Lift and Shift. The percentage of these applications getting modernised are still in low numbers. Most organisations decide to migrate as their first phase and eventually modernise/decommision the workloads in subsequent phases. Until this happens, it is important that these services are provided with a Business Continuity and DR strategy. There is also another scenario where businesses chose IaaS workloads over PaaS and Containers because of security concerns and other requirements. Virtual machines are here to stay atleast for another 10 years in my humble opinion and definitely they need a DR strategy and Azure Site Recovery is a Cloud native offering which can meet up organisational DR needs.

Enabling Azure Site Recovery

Azure Site Recovery is offered through Recovery Services Vault. Depending on the governance model, Recovery Services vault can be provisioned in a Hub if businesses are looking to centrally manage or Spoke Subscription for a more distributed type of management. Site Recovery has to be enabled though and it needs several components to be configured to complete a Site Recovery Infrastructure. Lets discuss these in below sections.

Setting up the Foundations

As a part of broader Cloud Enablement and DR strategy, businesses widely choose their Primary and Secondary regions as a part of Cloud Foundations. It is vital to define the Source and Target networks in these regions for enabling ASR. Along with the networking, there are few other components required which are explained below.

Some of the key factors to consider:

  • Existing or Dedicated Target network/subnets for ASR workloads.
  • Reserving enough IP address spaces in Target Virtual network/Subnets to accomodate failed over workloads.
  • Ensure networking is consistent between regions such as NSGs, Firewall rules to always accomodate Primary and Secondary Networks and inter connectivity between tiers if any (eg: Web to Data Subnet in Primary and Secondary).
  • Firewall and NSG rules to enable access to several PaaS services such as Storage, Automation, KeyVault that supports ASR.
  • Resource Groups in Target Regions.

A dedicated Cache Storage account in Primary region will have to be created as well which will be used by Azure Site recovery in replication.

Define your Replication Policies

It is very vital to understand and frame the RTO and RPO of your existing workloads through detailed assessment as a part of your broader DR strategy.

What is RTO and RPO

Recovery Time Objective is the amount of time within which the system or service must be restored in case of disaster.

Recovery Point Objective is the amount of time for which data loss is acceptable through an outage of system or service before it can significantly impact business.

Understanding the existing RTO and RPO and translating to what ASR and Azure can offer is important in framing the replication policies.

Crash Consistency Vs App Consistency

Similar to other backup and recovery services, ASR offers two type of snapshots. Crash Consistent is default to every 5 minutes and cant be modified. This is a system state snapshot and captures the data in disks. Often, most of the applications are capable to recover with these snapshots.

Application Consistency snapshots offers data consistency by capturing in memory and transactions if any in progress in addition to what Crash Consistent offers.

Snapshot Retention

Snapshot retention is another important factor as there are costs in maintaining the snapshots in form of storage.

Tiered Replication Policy

Based on the above factors, it is essential to create the replication policies. Often the business fall into tiered model such as below and some approximate numbers as an example.

Replication Policy Snapshot Frequency (in hours) Snapshot Retention (in days)
Gold 1 2
Silver 2 3
Bronze 3 4

This marks the completion of Recovery Services Vault configuration and is now ready to onboard the Virtual machines for replication. In the upcoming blog posts in this series, I will explain the Onboarding of Virtual Machines into ASR, Failovers and Infrastructure as a Code practices/challenges for Azure Site Recovery.

References

Top comments (0)