DEV Community

Cover image for CI/CD for ETL/ELT pipelines

CI/CD for ETL/ELT pipelines

Originally published at ・3 min read

One of Dataform’s key motivations has been to bring software engineering best practices to teams building ETL/ELT pipelines. To further that goal, we recently launched support for you to run Continuous Integration (CI) checks against your Dataform projects.

What is CI/CD?

CI/CD is a set of processes which aim to help teams ship software quickly and reliably.

Continuous integration (CI) checks automatically verify that all changes to your code work as expected, and typically run before the change is merged into your Git master branch. This ensures that the version of the code on the master branch always works correctly.

Continuous deployment (CD) tools automatically (and frequently) deploy the latest version of your code to production. This is intended to minimize the time it takes for new features or bugfixes to be available in production.

CI/CD for Dataform projects

Dataform already does most of the CD gruntwork for you. By default, all code committed to the master branch is automatically deployed. For more advanced use cases, you can configure exactly what you want to be deployed and when using environments.

CI checks, however, are usually configured as part of your Git repository (usually hosted on GitHub, though Dataform supports other Git hosting providers).

How to configure CI checks

Dataform distributes a Docker image which can be used to run the equivalent of Dataform CLI commands. For most CI tools, this Docker image is what you'll use to run your automated checks.

If you host your Dataform Git repository on GitHub, you can use GitHub Actions to run CI workflows. This post assumes you’re using GitHub Actions, but other CI tools are configured in a similar way.

Here’s a simple example of a GitHub Actions workflow for a Dataform project. Once you put this in a .github/workflows/<some filename>.yaml file, GitHub will run the workflow on each pull request and commit to your master branch.

name: CI

      - master
      - master

    runs-on: ubuntu-latest
      - name: Checkout code into workspace directory
        uses: actions/checkout@v2
      - name: Install project dependencies
        uses: docker://dataformco/dataform:1.6.11
          args: install
      - name: Run dataform compile
        uses: docker://dataformco/dataform:1.6.11
          args: compile

This workflow runs dataform compile - this means that if the project fails to compile, the workflow will fail, and this will be reflected in the GitHub UI.

Note that it’s possible to run any dataform CLI command in a CI workflow. However, some commands do need credentials in order to run queries against your data warehouse. In these circumstances, you should encrypt those credentials and commit the encrypted file to your Git repository. Then, in your CI workflow, you decrypt the credentials so that the Dataform CLI can use them.

For further details on configuring CI/CD for your Dataform projects, please see our docs. As always, if you have any questions, or would like to get in touch with us, please send us a message on Slack!

Discussion (0)