Derek Berger for Developers @ Asurion

Posted on Aug 14

Taking Your Releases Into Overdrive with GitHub Actions

#githubactions #automation #devops #productivity

Introduction

GitHub Actions’ seamless integration with version control simplifies creating and executing operations and infrastructure workflows. Two key features of Actions for building efficient workflows are:

Composite actions. Composite actions let you create combinations of steps that you can reuse across different kinds of workflows.
Job outputs. Outputs make values derived from one job's steps available to downstream jobs' steps.

In this article, I’ll share how Actions’ integration with version control, composite actions, and job outputs helped my team advance our use of Actions to automate production deployments.

Mostly manual automation

The workflow that my team built for disaster recovery uses a composite action that cuts off DNS traffic to the impaired region. It lets us execute failover in just one step, and the same workflow can be used for any operation that requires rerouting production traffic for an extended time. We just manually trigger it exactly the same way we would when failing over, specifying which DNS to change and which region to cut off.

While that can be helpful for major changes like infrastructure upgrades, major changes are not common. More often we simply deploy application or configuration updates via pull requests, following GitOps practices. To keep our uptime as high as possible and avoid disrupting customers, we tried using the failover workflow to reroute DNS during these everyday changes.

That at least rerouted traffic how we wanted, but it required multiple manual steps:

Trigger failover workflow to change DNS.
Merge pull request to deploy application change.
Verify pods roll out.
Trigger failover workflow to restore DNS.

Considering the workflow required thinking about which DNS needed to change, and manually selecting the right DNS and region options, the procedure took more like five steps. And despite automation for applying DNS changes and verifying pods, it became a drudgery to deploy anything.

Even worse, whenever we wanted to apply the change to multiple clusters, we’d have to repeat every step multiple times.

This was counterproductive, inefficient, and discouraged us from deploying as frequently as we could have.

Eradicating the toil

What we needed was a workflow to deploy everyday changes in one step, not four or five, so we set out to build a new workflow that uses the same composite actions as the failover workflow, but makes better overall use of GitHub Actions' automation capabilities.

First, since all production changes are deployed via pull request, instead of workflow_dispatch we use pull request triggers based on branch, path, and event type.

name: Production deployment
on:
 pull_request:
   branches:
     - main
   paths:
     - "path/to/region1/releases/namespace1/"
     - "path/to/region1/releases/namespace2/"
     - "path/to/region2/releases/namespace1/"
     - "path/to/region2/releases/namespace2/"
   types:
     - closed

We broke the new workflow down into 4 jobs, which correspond to the old manual steps.

Determine what changed.
Make DNS change to stop traffic.
Verify services’ pods roll out successfully
Make DNS change to restore traffic.

In the first job, we start by ensuring the workflow only proceeds upon merged pull requests, and not just any pull request closed event, by adding this if condition:

job1:
   if: (github.event_name == 'pull_request') && (github.event.pull_request.merged == true)

Unlike the failover workflow, the new workflow must automatically determine the right region and DNS to cut off. The dorny paths-filter action was our help for this.

We first use it to determine region based on change path:

     - name: Determine region
       uses: actions/checkout@v4
     - uses: dorny/paths-filter@v3
       id: region-filter
       with:
         filters: |
           region-1: 'path/to/cluster/account/region1/**'
           region-2: 'path/to/cluster/account/region2/**'

Next, we have a filter for DNS, which is important to get right because the DNS we want to reroute will vary depending on which services change:

     - name: Determine DNS to change
       uses: actions/checkout@v4
     - uses: dorny/paths-filter@v3
       id: dns-filter
       with:
         filters: |
           dns1: 
             - 'path/to/services/1'
             - 'path/to/services/2'
           dns2:
             - 'path/to/services/3'
             - 'path/to/services/4'
           dns3:
             - 'path/to/services/1'
             - 'path/to/services/2'
           dns4:
             - 'path/to/services/1'
             - 'path/to/services/2'
             - 'path/to/services/5'
             - 'path/to/services/6'

The final filter determines which pods to validate, also based on which services have changed:

     - name: Determine services
       uses: actions/checkout@v4
     - uses: dorny/paths-filter@v3
       id: service-filter
       with:
         filters: |
           service-1:
             - path/to/services/1
           service-2:
             - path/to/services/2
           service-3:
             - path/to/services/3
           service-4:
             - path/to/services/4

Finally, job outputs are what makes all these filtered values available to downstream jobs.

   outputs:
     dns: ${{ steps.dns-filter.outputs.changes }}
     services: ${{ steps.service-filter.outputs.changes }}
     region: ${{ steps.region-filter.outputs.region-1 == 'true' && 'region-1' || 'region-2' }}

The second job applies the DNS change, but includes a condition to only proceed if it finds values in the outputs for DNS and region.

 job2:
   needs: [ job1 ]
   if: ${{ needs.job1.outputs.dns != '[]' && needs.job1.outputs.dns != '' && needs.job1.outputs.region != '[]' && needs.job1.outputs.region != '' }}
   strategy:
     matrix:
       dns: ${{ fromJSON(needs.job1.outputs.dns) }}

It calls the same composite action as the failover procedure:

   steps:
     - uses: actions/checkout@v4
     - name: Stop traffic to ${{ needs.job1.outputs.region }} ${{ matrix.dns }}
       uses: './.github/actions/dns-change'
       with:
         dns: ${{ matrix.dns }}
         action: stop
         region: ${{ needs.job1.outputs.region }}

The third job verifies pods roll out, applying the values from the services output to a matrix:

job3:
   needs: [ job1, job2 ]
   if: ${{ needs.job1.outputs.services != '[]' && needs.job1.outputs.services != '' }}
   name: Validate pods
   strategy:
     matrix:
       service: ${{ fromJSON(needs.job1.outputs.services) }}

And calling the composite action to execute the pod validation steps:

- uses: actions/checkout@v4
     - name: validate deployments and pods
       uses: './.github/actions/pod-validation'
       with:
         deployment: ${{ matrix.service }}
         cluster: ${{ needs.job1.outputs.region == region-1' && '1' || '2' }}

If every pod rolls out successfully, the workflow proceeds to restore the traffic cut off in job 2.

 job4:
   name: Restore traffic to ${{ needs.determine-changes.outputs.region }} in ${{ matrix.dns }}
   strategy:
     matrix:
       dns: ${{ fromJSON(needs.job1.outputs.dns) }}
   steps:
     - uses: actions/checkout@v4
     - name: Restore traffic to ${{ needs.job1.outputs.region }} ${{ matrix.dns }}
       uses: './.github/actions/dns-change'
       with:
         dns: ${{ matrix.dns }}
         action: start
         region: ${{ needs.job1.outputs.region }}

If any step fails, the workflow fails and traffic remains routed away from the failed cluster while the team investigates. If necessary, we can open a pull request to revert the change. Merging it will trigger the workflow again, effectively validating the rollback.

Success!

By combining the composite actions we created for the failover workflow with path filters and job outputs into a new workflow, deployments now take one manual step: merging a pull request.

The workflow takes over from there, automatically making the proper DNS changes, verifying impacted pods roll out, restoring DNS traffic, and notifying us of results.

Unburdened by multiple manual steps, our team deployed 32 changes to production in the first month of using the workflow. In the previous month, we deployed 17 changes. The results so far have been promising, and we'll continue looking for ways to make our release practices even better with Actions.

DEV Community

Taking Your Releases Into Overdrive with GitHub Actions

Introduction

Mostly manual automation

Eradicating the toil

Success!

Top comments (0)

Read next

#48 — In An Excel Table, Find Rows Corresponding to The 1st And The Last Non-Empty Cells in Each Column

A Comprehensive Guide to Exploratory Testing

Too much to learn

#43 - Find the Difference Between Two Strings