Martin Nanchev for AWS Community Builders

Posted on Jan 9, 2024

Centralized S3 backup with AWS Backup in terraform

Scenario

A day in the life of a Solution architect is connected with gathering requirements both functional and non-functional requirements. RTO and RPO are part of the non-functional requirements. In this post we will look at the disaster recovery for S3 buckets for different RTO and RPOs with the help of AWS Backup service. The RTOs and RPOs are defined for different use-cases and data classifications, namely - standard_data, sensitive_data, curated_data, curated_sensitive_data, critical_data, critical_sensitive_data. The idea is to create the whole infrastructure from scratch using data from AWS Backup

Object lock is a good start for protecting objects in the bucket for specific period of time. This comes in two modes - Compliance and Governance. Simply put governance allows object to be overwritten or deleted if you have specific permission to do so, while compliance is more strict and protects the objects from deletion even from the root user. Versioning is a prerequisite for object lock. If you put an object into a bucket that already contains an existing protected object with the same object key name, Amazon S3 creates a new version of that object. The existing protected version of the object remains locked according to its retention configuration. Versioning is required as a prerequisite for WORM or object lock.

Here are the defined scenarios and requirements:

Backup level	AWS Backup enabled	Object lock	Vault copy	Point in time recovery	Retention	Delete backups after days
standard_data		X		X	15 days
sensitive_data	X	X		X	15 days	15
curated_data	X	X		X	35 days	365
curated_sensitive_data	X	X		X	35 days	365
critical_data	X	X	X	X	35 days	1825
critical_sensitive_data	X	X	X	X	35 days	Set by the business in days

Table 1: AWS Backup use cases

We will look at each of the scenarios starting with the one, that does not require AWS backup:

Standard data



# DataBackupLevel = Standard Data, Write Once Read Many
module "standard_data" {
  source               = "terraform-aws-modules/s3-bucket/aws"
  bucket               = "project-standard-data-martin-n"
  attach_public_policy = false
  versioning = {
    status     = true
    mfa_delete = false
  }
  object_lock_enabled = true
  object_lock_configuration = {
    rule = {
      default_retention = {
        mode = "GOVERNANCE"
        days = 15
      }
    }
  }
}

This will product our standard data from deletion for the time period of the object lock.
All other use cases sensitive_data, curated_data, curated_sensitive_data, critical_data, critical_sensitive_data will make use of AWS Backup as backup service.

Restoration of standard data objects

The restoration of files is done via aws cli or AWS SDK and consist of following steps:

Use the ListObjectVersions API to get the version ID of the object version you want to restore.
Call RestoreObject and pass the version ID as a parameter. You can also specify the number of days the restored object will be available.
The restored object will be available in S3 under a new object name. You can download or access it like any other object.

In object lock this will work differently, because you will put same object key and create a new version. To automate the process you can use S3 Batch operations.

Assumptions

To quote Werner Vogels: Everything fails all the time.

There are multiple disaster recovery strategies to tackle data failures as shown in the graphic below:

The Backup is associated with the lowest costs, but highest RTO and RPO, which could take hours. In our case we will limit the post to the first one or Backup, which tends to be associated with lowest cost.

Imagine if we suddenly can't reach our account and need to start from scratch with our entire setup. It's crucial to have our data backed up, and we should also make sure those backups are stored in another account to ensure access in case of a disaster. That's where AWS Backup comes in handy, providing a straightforward solution to keep our data safe and accessible.

What is AWS Backup?

AWS Backup makes it easy to centrally configure backup policies and monitor backup activity for AWS resources, such as Amazon Elastic Compute Cloud (EC2) instances, Amazon Elastic Block Store (EBS) volumes, Amazon Relational Database Service (RDS) databases, Amazon DynamoDB tables, Amazon Elastic File System (EFS) file systems, Amazon FSx file systems, and AWS Storage Gateway volumes.

We will the service backups to S3 only.

How can we store backups from one account in another account?

A simple solution will be to copy backups from one vault to the vault in another account. A vault is storage for snapshots, AMI or in more general context storage for backups

Central AWS account

A great way to begin is to define the vault in the central account. KMS key will be used to encrypt the vault for the snapshots and allow the workload account to copy images to it. We are using grant to allow this to the workload account



resource "aws_kms_key" "vault_kms" {
  description = "Vault kms key for encryption"
  policy      = <<POLICY
{
    "Id": "vault-kms-policy",
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow access to view and describe key",
            "Principal": {
                "AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:root"
            },
            "Action": [
                "kms:ListResourceTags",
                "kms:GetKeyPolicy",
                "kms:Describe*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow access for Key Administrators",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/github_ci"
            },
            "Action": [
                "kms:Create*",
                "kms:Describe*",
                "kms:Enable*",
                "kms:List*",
                "kms:Put*",
                "kms:Update*",
                "kms:Revoke*",
                "kms:Disable*",
                "kms:Get*",
                "kms:Delete*",
                "kms:TagResource",
                "kms:UntagResource",
                "kms:ScheduleKeyDeletion",
                "kms:CancelKeyDeletion"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow use of the key",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/kms_usage",
                    "arn:aws:iam::${var.workload_account_id}:root"
                ]
            },
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "*"
        },
        {
            "Sid": "Allow attachment of persistent resources",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::${data.aws_caller_identity.current.account_id}:role/vault_role",
                    "arn:aws:iam::${var.workload_account_id}:root"
                ]
            },
            "Action": [
                "kms:CreateGrant",
                "kms:ListGrants",
                "kms:RevokeGrant"
            ],
            "Resource": "*",
            "Condition": {
                "Bool": {
                    "kms:GrantIsForAWSResource": "true"
                }
            }
        }
    ]
}
POLICY
}

As a next step we define an alias to the key in order that the key is more human friendly or readable in the AWS console.



resource "aws_kms_alias" "aws_kms_alias" {
  name          = "alias/aws-backup-kms"
  target_key_id = aws_kms_key.vault_kms.key_id
}

The last step is to define the central vault, used to aggregate snapshots from workload account:



resource "aws_backup_vault" "central_vault" {
  name        = "central-aws-vault-S3"
  kms_key_arn = aws_kms_key.vault_kms.arn
}


resource "aws_backup_vault_lock_configuration" "locker" {
  backup_vault_name   = aws_backup_vault.central_vault.name
  changeable_for_days = 3
  max_retention_days  = 35
  min_retention_days  = 15
}

Last step is to create a policy which allows snapshots to be copied to the central_vault:



resource "aws_backup_vault_policy" "central_vault_allowance" {
  backup_vault_name = aws_backup_vault.central_vault.name
  policy = <<POLICY
{
  "Version": "2012-10-17",
  "Id": "default",
  "Statement": [
    {
      "Sid": "Allow Tool Prod Account to copy into iemtrialcluster_backup_vault",
      "Effect": "Allow",
      "Action": "backup:CopyIntoBackupVault",
      "Resource": "*",
      "Principal": {
        "AWS": "arn:aws:iam::${var.workload_account_id}:root"
      }
    }
  ]
}
POLICY
}

This is all, that is needed to define our central account repository for S3

Workload account

Now we will look at each of the use cases in the Table 1

Sensitive data

We will use for monitoring SNS to send us notification in pagerduty or slack via Lambda function in case of failed backups. For the backup selection we will use direct selection, although selection by Tag could be more appropriate for most use cases. AWS Backup also supports Point in time recovery trough enabling continous backup in the backup plan




module "aws_backup_s3_sensitive_data" {
  source     = "lgallard/backup/aws"
  vault_name = "${local.service_name}-s3-backup-sensitive-data"
  plan_name  = "${local.service_name}-s3-backup-sensitive-data-plan"
  notifications = {
    sns_topic_arn       = data.aws_sns_topic.failed_backups.arn
    backup_vault_events = ["BACKUP_JOB_FAILED"]
  }

  rules = [
    {
      name              = "${local.service_name}-s3-backup-sensitive-data-rule"
      schedule          = "cron(5 2 * * ? *)"
      start_window             = 60
      completion_window        = 180
      enable_continuous_backup = true
      lifecycle = {
        cold_storage_after = null
        delete_after       = 15
      }
    }
  ]
  selections = [
    {
      name      = "${local.service_name}-s3--sensitive-data-selection"
      resources = ["arn:aws:s3:::prefix-dummy-${local.environment}-database-sensitive-data-bucket", "arn:aws:s3:::dummy-${local.environment}-sensitive-data-logs"]
    }
  ]
}

Curated data and curated sensitive data

Curated data refers to information that has been carefully selected, organized, and maintained to ensure accuracy, relevance, and quality. Think of it like a well-maintained library where librarians carefully choose and organize books to provide a reliable and valuable collection for visitors. In the context of data, curators, often experts in a specific field, carefully choose, validate, and organize data to create a trustworthy and useful dataset. This process helps ensure that the information is reliable, up-to-date, and suitable for specific purposes, making it easier for users to find and use the data they need.




module "aws_backup_s3_curated_data" {
  source     = "lgallard/backup/aws"
  vault_name = "${local.service_name}-s3-curated-data-backup"
  plan_name  = "${local.service_name}-s3-backup-curated-data-plan"
  notifications = {
    sns_topic_arn       = data.aws_sns_topic.failed_backups.arn
    backup_vault_events = ["BACKUP_JOB_FAILED"]
  }

  rules = [
    {
      name              = "${local.service_name}-s3-backup-curated-data-rule"
      schedule          = "cron(5 2 * * ? *)"
      start_window             = 60
      completion_window        = 180
      enable_continuous_backup = true
      lifecycle = {
        cold_storage_after = null
        delete_after       = 35
      }
    }
  ]
  selections = [
    {
      name      = "${local.service_name}-s3-selection"
      resources = ["arn:aws:s3:::prefix-dummy-${local.environment}-database-curated-data-bucket", "arn:aws:s3:::dummy-${local.environment}-curated-data-logs"]
    }
  ]
}

Critical data and critical sensitive data

This data brings most value to the business and help your business executives to make decision by using a data driven approach. The Backups will be copied to the central account




module "aws_backup_critical" {
  source     = "lgallard/backup/aws"
  vault_name = "${local.service_name}-s3-backup-critical-data"
  plan_name  = "${local.service_name}-s3-backup-critical-data-plan"
  notifications = {
    sns_topic_arn       = data.aws_sns_topic.failed_backups.arn
    backup_vault_events = ["BACKUP_JOB_FAILED"]
  }

  rules = [
    {
      name              = "${local.service_name}-s3-backup-critical-data-rule"
      schedule          = "cron(5 2 * * ? *)"
      copy_actions = [
        {
          lifecycle = {
            cold_storage_after = 90
            delete_after       = 1825
          },
          destination_vault_arn = var.central_s3_vault_arn
        },
      ]
      start_window             = 60
      completion_window        = 180
      enable_continuous_backup = true
      lifecycle = {
        cold_storage_after = null
        delete_after       = 35
      }
    }
  ]
  selections = [
    {
      name      = "${local.service_name}-s3-critical-data-selection"
      resources = ["arn:aws:s3:::prefix-dummy-${local.environment}-database-critical-data-bucket", "arn:aws:s3:::dummy-${local.environment}-critical-data-logs]
    }
  ]
}

Now that we have the backups, How can we restore objects. Lets have a look in the next section

Data restoration

Go to AWS Backup -> Backup vaults -> Select the recovery point, that you want and click Actions -> restore

Conclusion

In conclusion, ensuring the integrity and accessibility of data is a critical aspect of a Solution Architect's role, and AWS Backup proves to be an indispensable tool in this endeavor. The outlined disaster recovery scenarios, comprehensive use cases, and the step-by-step configuration guide for central and workload accounts underscore the significance of AWS Backup in safeguarding data across various classifications. Whether dealing with standard data or critical sensitive data, the emphasis on backup, object lock, and versioning ensures a robust data protection strategy. This comprehensive approach not only addresses the need for backup but also sheds light on restoration procedures, making AWS Backup an essential component for maintaining data resilience in the ever-evolving landscape of cloud computing

DEV Community