DEV Community

Cover image for Preview Environments with AWS & Cloudflare
Karan Pratap Singh
Karan Pratap Singh

Posted on

Preview Environments with AWS & Cloudflare

On-demand preview environment is a strategy to spin up a temporary infrastructure and isolated environments on the fly. This helps us open a discussion with other teams such as Product and QA at an early stage of the release process and improves cross-team visibility. So, In this article, we'll see how we can achieve this with AWS ECS and Cloudflare.

All the code is available in this repository

Why do we need it?

Let's see how this can bring benefits to our release and team workflow processes. This is an example from my personal experience.

Usual Workflow

Currently, QA and product reviews are tightly coupled to releases and it's often hard to roll back changes once they're in the release itself.

usual-workflow

New Workflow

This will provide a huge benefit to QA and the Product team as they will be able to do a soft review on the changes. The product Team will no longer need to wait till the changes have reached to the staging environment to review. Same with QA Team, they can test your changes right at the pull request level.

new-workflow

Challenges

Let's look at some challenges I faced while architecting this, and how offerings from Cloudflare helped.

SSL
One of the big challenges was setting up SSL, because we cannot use certificate generated by AWS ACM with our own custom Nginx proxy as ACM only works with AWS services like CloudFront, ALB, API Gateway, etc.

There are a few approaches I saw online while researching this:

  • One approach is to use Let's Encrypt to generate temporary SSL certs. Here's a good implementation for this. But this presents with other issues of managing all the certs we generate through Let's Encrypt.

  • Another approach is to just add a new Route 53 record and then forward it to ALB. The issue is not that we'll need to provision these resources but we'll have to provision and destroy them quite frequently!

Cloudflare Argo Tunnel to the rescue! With this, we can simply close all the ingress and instead expose our traffic through the tunnels. After that we can create a proxied DNS record and Cloudflare can handle the SSL stuff for us!

Here's my previous article where I cover argo tunnels in detail

Security

The most important part of this is the security, because what's stopping me from exposing a backdoor to our AWS infrastructure either intentionally or by mistake to the internet just by simply including risky changes in my pull request? We need a secure way to expose these temporary environments.

One solution that initially came to my mind was AWS VPN or something similar. So that we can only allow access to the environments to the people using our VPN. Sounds good? But ultimately this would've required us to onboard every team member to setup and use VPN.

Cloudflare Access is a game changer, and it's free up to 50 users! This is just what I needed to create a secure, faster, and zero trust access to the temporary environments without the VPN.

We'll talk more about all the Cloudflare usage in detail in Access section.

Architecture

Our architecture is pretty simple and intuitive. On the left, we can see how we build our app and provision our infrastructure when the developer opens a new pull request and labels it. An interesting component is the custom script which we'll implement. In the middle, we go into some detail about our infrastructure setup with AWS ECS. On the right, we see how we leverage Cloudflare Argo Tunnel and Cloudflare Access for securing access to our temporary environments endpoint.

architecture

view this in higher resolution

Implementation

I've divided the whole thing into three sections:

  • Setup
  • Infrastructure
  • Access

Note: Grep the repository for todo- to get all the things you need to provide (ie. keys, tokens)

Setup

In this step, we'll see how we can use Github Actions and our custom provisioning script, etc.

Github Action

We basically need to listen to the pull_request event with the following types labeled, unlabeled, synchronize, closed. We have the following steps in our GitHub action:

  • Provision
    We will create our preview environment once the pull request is labeled, and synchronize once new commits are pushed.

    • Create a Cloudflare Argo tunnel, access policies, and access application.
    • Store the credentials in config.yml
    • Copy the credentials to our docker image during the build, so we can create an outbound connection to Cloudflare at runtime.
    • Fill and register the task definition.
    • A script to process events from GitHub and provision temporary AWS and Cloudflare infrastructure.
  • Destroy
    We will destroy our preview environment once the pull request is closed, or unlabeled.

    • Destroy temporary AWS and Cloudflare resources (i.e argo tunnel, access policy, access apps)

Github action is already included in the repository .github/workflows/preview-environment.yml. Here's a snippet.

name: Preview Environment
on:
  pull_request:
    types: [labeled, unlabeled, synchronize, closed]
    branches:
      - develop

env: ...

jobs:
  provision:
    name: Provision
    if: ${{ github.event.action == 'labeled' && github.event.label.name == 'preview' && github.event.pull_request.state == 'open' || github.event.action == 'synchronize' && contains(github.event.pull_request.labels.*.name, 'preview') }}
    steps: ...

  destroy:
    name: Destroy
    if: ${{ github.event.action == 'unlabeled' && github.event.label.name == 'preview' || github.event.action == 'closed' }}
    steps: ...
Enter fullscreen mode Exit fullscreen mode

Provisioning Script

This script helps us provision or destroy our temporary infrastructure, it is located in scripts/preview. Since we don't have any way to maintain about provisioned infrastructure, we simply use the branch name as our slug or a unique id throughout the process. This script is configurable via config.ts as shown below.

import * as env from 'env-var';

const config = {
  // Domain for Cloudflare access policy
  domain: '<todo_your_domain>',
  aws: {
    region: 'us-east-1',
  },
  github: {
    // Token and Pull request no. will be available in Github Action
    token: env.get('GITHUB_TOKEN').required().asString(),
    pull_number: env.get('PULL_NUMBER').required().asInt(),
  },
  vpc: {
    securityGroups: {
      filter: '<todo_your_security_group_tag>',
    },
    subnets: {
      filter: '<todo_your_subnet_tag>',
    },
  },
  ecs: {
    cluster: '<todo_your_ecs_cluster_name>',
  },
  cloudflare: {
    path: './outputs/tunnel',
    auth_email: '<todo_your_cloudflare_email>',
    api_key: env.get('CLOUDFLARE_API_KEY').required().asString(),
    token: env.get('CLOUDFLARE_API_TOKEN').required().asString(),
    accountId: env.get('CLOUDFLARE_ACCOUNT_ID').required().asString(),
    zoneId: env.get('CLOUDFLARE_ZONE_ID').required().asString(),
    domain: '<todo_your_cloudflare_domain>',
  },
};

export default config;
Enter fullscreen mode Exit fullscreen mode

It all comes together in preview.ts:

import * as github from '@actions/github';
import slugify from 'slugify';
import CloudflareUtils from './utils/cloudflare';
import ECSUtils from './utils/ecs';
import * as GithubUtils from './utils/github';
import * as VPCUtils from './utils/vpc';
import log from './utils/log';

interface PreviewInterface {
  provision(taskDefArn: string): Promise<void>;
  destroy(): Promise<void>;
  tunnel(): Promise<void>;
}

class Preview implements PreviewInterface {
  private slug: string;

  constructor(branch: string) {
    const options = {
      lower: true,
    };
    const suffix = `${branch}-preview`;
    this.slug = slugify(suffix, options);
    log.info(`Using slug "${this.slug}" for branch "${branch}"`);
  }

  async provision(taskDefArn: string): Promise<void> {
    try {
      log.info(`Provisioning resources for task definition arn: ${taskDefArn}`);
      const subnets = await VPCUtils.getSubnets();
      const securityGroups = await VPCUtils.getSecurityGroups();
      const ecs = new ECSUtils(this.slug);
      const cloudflare = new CloudflareUtils(this.slug);
      await ecs.runTask(taskDefArn, subnets, securityGroups);
      const comment = `Your preview environment should be up at https://${cloudflare.domain} in few moments! 🎉`;
      if (github.context.payload.action === 'labeled') {
        await GithubUtils.commentOnPR(comment);
      }
      log.success(comment);
    } catch (error) {
      log.error(error);
      log.warn('Performing rollback!');
      this.destroy();
      process.exit(1);
    }
  }

  async destroy(): Promise<void> {
    try {
      log.info(`Destroying resources`);
      const ecs = new ECSUtils(this.slug);
      const cloudflare = new CloudflareUtils(this.slug);
      await ecs.stopTask();
      await cloudflare.removeDNSRecord();
      await cloudflare.deleteTunnels();
      await cloudflare.removeAccess();
      log.success('Resources destroyed');
    } catch (error) {
      log.error(error);
      process.exit(1);
    }
  }

  async tunnel(): Promise<void> {
    try {
      const cloudflare = new CloudflareUtils(this.slug);
      const tunnelId = await cloudflare.createTunnel();
      cloudflare.createConfigFile(tunnelId);
      await cloudflare.addDNSRecord(tunnelId);
      await cloudflare.createAccess();
      log.success('Tunnel setup complete');
    } catch (error) {
      log.error(error);
      process.exit(1);
    }
  }
}

export default Preview;
Enter fullscreen mode Exit fullscreen mode

Here how we use it:

preview/commands/tunnel.ts

import Preview from '../preview';
import * as GithubUtils from '../utils/github';

async function run(): Promise<void> {
  const branch = await GithubUtils.getCurrentBranch();

  const preview = new Preview(branch);
  await preview.tunnel();
}

run();
Enter fullscreen mode Exit fullscreen mode

Usage:

$ yarn tunnel
Enter fullscreen mode Exit fullscreen mode

This creates a CloudFlare credential config.yml like below.

tunnel: <tunnel-id>
credentials-file: /root/.cloudflared/<tunnel-id>.json

ingress:
  - hostname: subdomain.domain.com
    service: http://localhost:4000
  - service: http_status:404
Enter fullscreen mode Exit fullscreen mode

preview/commands/provision.ts:

import Preview from '../preview';
import { ArgumentParser } from 'argparse';
import * as GithubUtils from '../utils/github';

const parser = new ArgumentParser({
  description: 'Provision preview environment',
});

parser.add_argument('-td', '--task-def-arn', {
  required: true,
  help: 'Task definition arn',
});

async function run(): Promise<void> {
  const { task_def_arn } = parser.parse_args();
  const branch = await GithubUtils.getCurrentBranch();
  const preview = new Preview(branch);
  await preview.provision(task_def_arn);
}

run();
Enter fullscreen mode Exit fullscreen mode

Usage:

$ yarn provision --task-def-arn $TASK_DEFINITION
Enter fullscreen mode Exit fullscreen mode

preview/commands/destroy.ts:

import Preview from '../preview';
import * as GithubUtils from '../utils/github';

async function run(): Promise<void> {
  const branch = await GithubUtils.getCurrentBranch();

  const preview = new Preview(branch);
  await preview.destroy();
}

run();
Enter fullscreen mode Exit fullscreen mode

Usage:

$ yarn destroy
Enter fullscreen mode Exit fullscreen mode

Infrastructure

Here's the infrastructure we need before we start running our temporary tasks. I've added a snippet here, for full implementation check the infrastructure folder in the repository. I'm using terraform to provision this:

Note: If you're not familiar, you can learn more about terraform here

# ECR repository
resource "aws_ecr_repository" "ecr_repository" {
  name                 = "app-repository"
  image_tag_mutability = "IMMUTABLE"
  image_scanning_configuration {
    scan_on_push = true
  }
}

# ECS task definition used by ECS service
resource "aws_ecs_task_definition" "task_definition" {
  family                   = "app-task-definition"
  network_mode             = "awsvpc"
  cpu                      = 4096
  memory                   = 8192
  requires_compatibilities = ["FARGATE"]
  container_definitions    = jsonencode([
  {
    "name": "app",
    "image": "nginx:latest",
    "essential": true,
    "portMappings": [
      {
        "containerPort": 4000,
        "hostPort": 4000
      }
    ]
  }
])
  task_role_arn            = aws_iam_role.task_execution_role.arn
  execution_role_arn       = aws_iam_role.task_execution_role.arn
}

# Security group
resource "aws_security_group" "security_group" {
  name   = "app-security-group"
  vpc_id = var.vpc_id

  ingress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# ECS cluster
resource "aws_ecs_cluster" "cluster" {
  name = "ecs-cluster"
  capacity_providers = ["FARGATE"]
}

resource "aws_cloudwatch_log_group" "log_group" {
  name = "/ecs/app-log-group"
}
Enter fullscreen mode Exit fullscreen mode

Access

Now that we have our application and infrastructure running, let's talk about access. More specifically how we can take advantage of Cloudflare Access.

cloudflare-access

As we discussed earlier, after we create a tunnel, we create a proxied CNAME DNS record through the Cloudflare SDK like below.
dns-cname

Access Policy
Then, we can create an Access Policy to control who can access our secure endpoint. We can even enforce MFA!
access-policy

Access Groups
This is more of a fine tune thing, but using access groups we can create teams such as Engineering, Product, QA etc, and use these groups while configuring our access policies and much more. I'll leave this up to you.

access-groups


Usage

Here's how we can use our preview environments:

Provision

  • Developer creates a pull request.
    label

  • Developer labels the pull request with the preview label. Once labeled, our GitHub action should build our application and provision the infrastructure.
    action-provision

  • When the GitHub action completes, it will leave a comment on the pull request like below and the environment will be available at https://branch-slug.your-domain.com.
    action-done

  • Product or QA team uses a new environment to evaluate the pull request. Anyone who has access to Cloudflare (eg. person@your_domain.com), can login with the identity provider (In my case it was Okta) and access the preview environment.

access-login

Destroy

  • To destroy, either we can close the pull request or unlabel it to get our destroy step started. action-destroy

Improvements

For improvements, one idea can be to migrate the provisioning script to Go and make it a terraform provider.

Cost Estimations

Cost is pretty much translated into AWS ECS pricing (with Fargate) as we are using Cloudflare's free tier.

Conclusion

I hope this article was helpful, as always if you face any issues feel free to reach out.
Hopefully, this will bring some collaboration with the Product, QA, Solutions team at the early stages of the release process at your organization.

Discussion (2)

Collapse
mikestaub profile image
Mike Staub

Fantastic post! I will considering replacing OpenVPN with this solution.

Collapse
karanpratapsingh profile image
Karan Pratap Singh Author

Thanks Michael!