DEV Community

Oleksandr Borodavka for FreshBooks

Posted on

Building CI/CD for Vertex AI pipelines: The production

Hi there! As you probably already know from the first few articles of this series, we tested some new ideas and tools with a POC for one simple pipeline. Today we will review a new generalized version of CI/CD for Vertex AI pipelines we have built based on that experience and some further investigations.

A bit of context

Let's recall some base points to refresh the context:

  • Vertex AI is used to build training pipelines.

  • GitHub Actions is our CI/CD tool.

  • We built the declarative framework that allows us to standardize the format and operations for all our components and pipelines.

  • The specifications and implementations of all our components and pipelines are kept in one GitHub repository.

  • There are three environments: development(DEV), staging(STAGE) and production(PROD).

Stating the task

What do we want to achieve?
In simple words, we want to automate everything as much as possible. Ideally, when new changes appear in our code repository we want these changes applied in production in as short time a period as possible without any manual effort. Moreover, it should be done in a stable, safe, reproducible, and effective way.

Running a little ahead, there are three absolutely awesome things in our CI/CD practice. In all deployment environments, our Continuous Integration system:

  1. automatically rebuilds components and runs their unit and integration tests
  2. builds pipelines changed in Pull Requests
  3. and rebuilds dependent pipelines.

The way of code

Okay how can we achieve this?
Since developers work with the codebase in the form of Pull Requests we can use them as starting points for the workflows. There are two moments that we have to automate.

The first one is when a pull request is opened (or updated). Here we want to run the more basic and faster checks for the coming changes to provide quick feedback for a developer. These generally are unit tests and building jobs (just a build to check if configurations are okay) on our DEV environment. Then, if everything is fine, we have to run integration tests for components and run our pipelines on the STAGE environment to be sure it is ready to be merged.

The way of code

The second moment is when the PR is approved and merged. Here we have the changes which have already been tested, reviewed, and merged into the main branch, so it is ready for delivery. All the processes are run again but now on the PROD environment this time.

The implementation

For both stages, we use GitHub Actions workflows and some CLI commands of the Python framework, since routines related to code analysis are more expedient to implement with the framework’s specific code. It probably doesn’t make sense to review all the code, that would be too long and too specific. However, we can take a look at the general structure and a bit of a simplified version of the workflows that have detail enough to present the idea.

Here is the structure of the source code:

pipelines/
  pipeline1.yaml
  ...
components/
  component1/
    src/
      ...
    tests/
      unit/
          ...
      integration/
          test.yaml
    config.yaml
Enter fullscreen mode Exit fullscreen mode

There is a folder with pipeline specifications and a folder with components. Each component contains source files, tests, and a configuration.

And it is how the GitHub Actions directory looks like:

.github
  actions
    build_component
    build_pipeline
    run_pipeline
    test_component
    test_component_integration
  workflows
    pr_merged.yml
    pr_opened.yml
Enter fullscreen mode Exit fullscreen mode

Where the actions is a directory with reusable composite actions, which do all the operations we need with components and pipelines (build, test, and run). And in the workflows directory, we have two workflows that are automatically triggered by GitHub when a Pull Request is opened/updated or merged respectively.

PR is opened

Let's take a look at the first workflow:

name: MLOps Pull Request - Opened

on:
  # The workflow will be run automatically when a PR is 
  # opened to the main branch or changes to the PR are pushed
  pull_request:
    types: [opened, synchronize, reopened]
    branches:
      - "main"
    paths:
      - "pipelines/**/*.yaml"
      - "components/**/*.yaml"
      - "components/**/*.py"

# Use concurrency to ensure that only a single workflow 
# using the same concurrency group will run at a time
concurrency: dev_stage_environment

jobs:
  git_diff:
    # Get a list of changed files in the PR for the future analysis
    runs-on: ubuntu-latest
    outputs:
      diff: ${{ steps.getter.outputs.diff }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
      - name: Get git diff
        id: getter
        run: |
          GIT_DIFF="$(echo $(git diff --name-only origin/main...origin/${GITHUB_HEAD_REF}))"
          echo "::set-output name=diff::$GIT_DIFF"
  get_component_list:
    # Get a list of names for the added/changed components
    needs: git_diff
    runs-on: ubuntu-latest
    outputs:
      names: ${{ steps.getter.outputs.names }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get list
        id: getter
        run: |
          NAMES="$(make get_components --paths='${{ needs.git_diff.outputs.diff }}')"
          echo "::set-output name=names::$NAMES"
  test_and_build_components_on_dev:
    # Test and build changed components on the DEV environment
    needs: get_component_list
    runs-on: ubuntu-latest
    environment: development
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_component_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Test component
        uses: ./.github/actions/test_component
        with:
          component_name: ${{ matrix.name }}
      - name: Build component
        uses: ./.github/actions/build_component
        with:
          component_name: ${{ matrix.name }}

  get_pipeline_list:
    # Get a list of names for the added/changed pipelines
    needs: [ git_diff, test_and_build_components_on_dev ]
    runs-on: ubuntu-latest
    outputs:
      names: ${{ steps.getter.outputs.names }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get list
        id: getter
        run: |
          NAMES="$(make get_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
          echo "::set-output name=names::$NAMES"
  build_pipelines_on_dev:
    # Build changed pipelines on the DEV environment
    needs: get_pipeline_list
    runs-on: ubuntu-latest
    environment: development
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_pipeline_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build pipeline
        uses: ./.github/actions/build_pipeline
        with:
          pipeline_name: ${{ matrix.name }}

  get_indirect_pipeline_list:
    # Get a list of names for the indirectly changed pipelines
    # (when a related component was changed)
    needs: [ git_diff, build_pipelines_on_dev ]
    runs-on: ubuntu-latest
    outputs:
      names: ${{ steps.getter.outputs.names }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get list
        id: getter
        run: |
          NAMES="$(make get_indirect_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
          echo "::set-output name=names::$NAMES"
  build_indirect_pipelines_on_dev:
    # Build indirectly changed pipelines on the DEV environment
    needs: get_indirect_pipeline_list
    runs-on: ubuntu-latest
    environment: development
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build pipeline
        uses: ./.github/actions/build_pipeline
        with:
          pipeline_name: ${{ matrix.name }}

  build_components_and_test_integration_on_stage:
    # Build changed components and run integration tests 
    # for them on the STAGE environment
    needs: [ build_pipelines_on_dev, build_indirect_pipelines_on_dev, get_component_list ]
    runs-on: ubuntu-latest
    environment: staging
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_component_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build component
        uses: ./.github/actions/build_component
        with:
          component_name: ${{ matrix.name }}
      - name: Test component integration
        uses: ./.github/actions/test_component_integration
        with:
          component_name: ${{ matrix.name }}

  build_and_run_pipelines_on_stage:
    # Build changed pipelines and run them on the STAGE environment
    needs: [ build_components_and_test_integration_on_stage, get_pipeline_list ]
    runs-on: ubuntu-latest
    environment: staging
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_pipeline_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build pipeline
        uses: ./.github/actions/build_pipeline
        with:
          pipeline_name: ${{ matrix.name }}
      - name: Run pipeline
        uses: ./.github/actions/run_pipeline
        with:
          pipeline_name: ${{ matrix.name }}

  build_and_run_indirect_pipelines_on_stage:
    # Build indirectly changed pipelines and run them on the STAGE environment
    needs: [ build_and_run_pipelines_on_stage, get_indirect_pipeline_list ]
    runs-on: ubuntu-latest
    environment: staging
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build pipeline
        uses: ./.github/actions/build_pipeline
        with:
          pipeline_name: ${{ matrix.name }}
      - name: Run pipeline
        uses: ./.github/actions/run_pipeline
        with:
          pipeline_name: ${{ matrix.name }}
Enter fullscreen mode Exit fullscreen mode

It is automatically called when a pull request is opened or updated. Due to the concurrency feature, it will be run once at a time. However, all the similar jobs, like tests will be run in parallel.

The logic inside is as follows:

  1. Analyze the changes in a pull request and find out which components and/or pipelines were affected.

  2. Build them in the predefined order.

  3. Run automated tests for the components.

  4. Run the pipelines to retrain models and deliver them for the final usage.

It is universal and works with any components and pipelines when they follow the framework agreements. Also, it is safe, works without duplicates, runs the jobs in the right order, and parallelizes them when it is possible.

The current workflow operates on development and staging environments.

PR is merged

The second workflow is run when PR is merged.

name: MLOps Pull Request - Merged

on:
  # The workflow will be run automatically when a PR is closed
  pull_request:
    types:
      - closed
    branches:
      - "main"
    paths:
      - "pipelines/**/*.yaml"
      - "components/**/*.yaml"
      - "components/**/*.py"

# Use concurrency to ensure that only a single workflow 
# using the same concurrency group will run at a time
concurrency: prod_environment

jobs:
  if_merged:
    # There is no way to trigger the workflow when it was merged 
    # (for now we know only it was closed)
    # so we have to check it at the first job
    if: github.event.pull_request.merged == true
    runs-on: ubuntu-latest
    steps:
      - run: echo The PR was merged
  git_diff:
    # Get a list of changed files in the PR for the future analysis
    needs: if_merged
    runs-on: ubuntu-latest
    outputs:
      diff: ${{ steps.getter.outputs.diff }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
        with:
          fetch-depth: 0
      - name: Get git diff
        id: getter
        run: |
          GIT_DIFF="$(echo $(git diff --name-only ${GITHUB_SHA}^ ${GITHUB_SHA}))"
          echo "::set-output name=diff::$GIT_DIFF"
  get_component_list:
    # Get a list of names for the added/changed components
    needs: git_diff
    runs-on: ubuntu-latest
    outputs:
      names: ${{ steps.getter.outputs.names }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get list
        id: getter
        run: |
          NAMES="$(make get_components --paths='${{ needs.git_diff.outputs.diff }}')"
          echo "::set-output name=names::$NAMES"
  test_and_build_components_on_prod:
    # Test and build changed components on the PROD environment
    needs: get_component_list
    runs-on: ubuntu-latest
    environment: production
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_component_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Test component
        uses: ./.github/actions/test_component
        with:
          component_name: ${{ matrix.name }}
      - name: Build component
        uses: ./.github/actions/build_component
        with:
          component_name: ${{ matrix.name }}
      - name: Test component integration
        uses: ./.github/actions/test_component_integration
        with:
          component_name: ${{ matrix.name }}

  get_pipeline_list:
    # Get a list of names for the added/changed pipelines
    needs: [ git_diff, test_and_build_components_on_prod ]
    runs-on: ubuntu-latest
    outputs:
      names: ${{ steps.getter.outputs.names }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get list
        id: getter
        run: |
          NAMES="$(make get_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
          echo "::set-output name=names::$NAMES"
  build_and_run_pipelines_on_prod:
    # Build and run changed pipelines on the PROD environment
    needs: [ test_and_build_components_on_prod, get_pipeline_list ]
    runs-on: ubuntu-latest
    environment: production
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_pipeline_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build pipeline
        uses: ./.github/actions/build_pipeline
        with:
          pipeline_name: ${{ matrix.name }}
      - name: Run pipeline
        uses: ./.github/actions/run_pipeline
        with:
          pipeline_name: ${{ matrix.name }}

  get_indirect_pipeline_list:
    # Get a list of names for the indirectly changed pipelines
    # (when a related component was changed)
    needs: [ git_diff, build_and_run_pipelines_on_prod ]
    runs-on: ubuntu-latest
    outputs:
      names: ${{ steps.getter.outputs.names }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Get list
        id: getter
        run: |
          NAMES="$(make get_indirect_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
          echo "::set-output name=names::$NAMES"
  build_and_run_indirect_pipelines_on_prod:
    # Build indirectly changed pipelines and run them on the PROD environment
    needs: [ build_and_run_pipelines_on_prod, get_indirect_pipeline_list ]
    runs-on: ubuntu-latest
    environment: production
    strategy:
      # Use matrix strategy to run the tasks in parallel
      matrix:
        name: ${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}
    steps:
      - name: Checkout
        uses: actions/checkout@v3
      - name: Build pipeline
        uses: ./.github/actions/build_pipeline
        with:
          pipeline_name: ${{ matrix.name }}
      - name: Run pipeline
        uses: ./.github/actions/run_pipeline
        with:
          pipeline_name: ${{ matrix.name }}
Enter fullscreen mode Exit fullscreen mode

This workflow is quite similar to the previous one, just runs all the jobs in the production environment.

Conclusion

That’s it! The presented solution has been working well for us for more than 6 months already. We have some ideas on how to make it even better and maybe we will share the results with the community in the next articles.

I hope this will be be useful to you in your MLOps journey and helps you to save some time while building your own CI/CD process for ML pipelines.

Please, feel free to share any thoughts, questions, or proposals in the comments.

Thank you and happy coding!

Top comments (0)