DEV Community

vishal9629
vishal9629

Posted on

MLOps on GitHub Actions with Cirun

MLOps is very useful in ML world and when run on GPUs they are way faster. To have an idea how fast MLOps on GPUs, here we are doing comparison of MLOps on CPU and GPU using GitHub Actions and Cirun.

About MLOps and it's uses

MLOps is set of practices that establishes a smooth connection from the creation of the ML models to the production. MLOps continues by concentrating on the upkeep and overall monitoring of Machine Learning Engineering. MLOps reduce friction with DevOps and IT, allows better cooperation with data teams, makes ML pipelines reproducible, and speeds up the process.

Why MLOps are better on GPUs?

As ML models uses too many mathematical algorithms and demands to train the different data on same code and GPUs are designed to do so. So, whenever we do ML operations on GPUs, it will be done effectively and also minimize the computing time complexity.

How Cirun might help to do this?

Cirun enables us to create GPU enabled virtual machines on our cloud. It minimize the gap between GitHub Actions and automation of self-hosted runners. By using this feature we can automate MLOps on desired GPUs.

Setup of MLOps on CPU and GPU using GitHub Actions and Cirun for comparison

  • In first step created a workflow for MLOps to be run on a GitHub Actions machine having CPU and on a Self-hosted runner having GPU.
  • Then for MLOps we have to make ML friendly environment. For this we used CML, which is an open-source CLI tool for implementing CI/CD with a focus on MLOps. CML uses custom Docker images that come pre-installed libraries that are essential for MLOps like NodeJS, Python, DVC (Data Version Control).
  • To setup CML environment on CPU use docker container mentioned below.
image: "docker://dvcorg/cml-py3:latest"
Enter fullscreen mode Exit fullscreen mode
  • To setup CML environment on GPU use docker container mentioned below.
image: "ghcr.io/iterative/cml:0-dvc2-base1-gpu"
Enter fullscreen mode Exit fullscreen mode
  • Followed by container argument.
options: "--gpus all"
Enter fullscreen mode Exit fullscreen mode
  • This is very important to pass the GITHUB_TOKEN to have a working workflow. So, that authentication is done on behalf of GitHub Actions. You can have look to GITHUB TOKEN.
- name: "MLops"
        env:
          repo_token: "${{ secrets.GITHUB_TOKEN }}"
Enter fullscreen mode Exit fullscreen mode

Machines used for comparison:-

For MLOps on CPU, we used GitHub-hosted runners which only provides CPUs to work with.

Hardware configuration of GitHub Actions machine.

  • 2-core CPU (x86_64)
  • 7 GB of RAM
  • 14 GB of SSD space

For MLOps on GPU, we used Self-hosted runners with NVIDIA T4 GPU.

In example we used AWS instance g4dn.xlarge to demonstrate MLOps on GPU.

To configure GPUs using self-hosted runner see Cirun Configuration.

Hardware configuration of instance g4dn.xlarge.

  • NVIDIA T4 GPU
  • 4-core CPU (x86_64)
  • 16 GB of RAM
  • 125 GB of SSD space

Workflow used for comparison

This workflow created jobs for two runners, one is for self-hosted using GPU and another one is for GitHub action runner using CPU. To review the latest version of this workflow in this repository.

Also you can directly copy workflow for single runner. MLOps using GPUs

name: "GitHub Actions with your own GPUs"
on: 
  push:
    branches: [ "none" ] # When a push occurs on a branch workflow will trigger on your desired branch.
  workflow_dispatch: 
jobs:
  CPU_GPU_matrix: # Creating matrix for jobs.
    strategy:
      matrix:
        # Creating a 2d matrix to execute our jobs with respective docker containers.
        os: ["ubuntu-latest", "cirun.gpu"] 
        containers: ["docker://dvcorg/cml-py3:latest" , "ghcr.io/iterative/cml:0-dvc2-base1-gpu" ]
        container_arg: [" " , "--gpus all"]
        # Excluding the unwanted docker container and container_arg in our respective OS.
        exclude:
          - os: "ubuntu-latest"
            containers: "ghcr.io/iterative/cml:0-dvc2-base1-gpu"
          - os: "ubuntu-latest"
            container_arg: "--gpus all"
          - os: "cirun.gpu"
            containers: "docker://dvcorg/cml-py3:latest"
          - os: "cirun.gpu"
            container_arg: " "
    # Workflow to run commands in OS.
    # Dynamic passing of os, containers, container_agr using matrix 
    runs-on: "${{ matrix.os }}" 
    container:
      image: "${{ matrix.containers }}"
      options: "${{ matrix.container_arg }}"
    # Steps that we want to have with OS.
    steps:
      - uses: "actions/checkout@v3"
      - name: "Dependency Install"
        run: "pip install -r requirements.txt"
      - name: "MLops"
        env:
          repo_token: "${{ secrets.GITHUB_TOKEN }}"
        run: |
          # Your ML workflow goes here, you can pass your variable name
          "python train.py"
Enter fullscreen mode Exit fullscreen mode

Conclusion

Comparison of time complexities:

Example repository uses a workflow to run MLOps on CPU and GPU using Cirun and returns execution times.

  • You can see the Time complexity difference in CPU and GPU here.

Time Complexities

In this image we can clearly see that GPU is 5.53 times faster then CPU and we can choose between two.

Flowchart

How Cirun helped us to do this?

Cirun provides us with feature to create On-demand Self-Hosted Github Actions Runners with any configuration on our cloud. We also know that MLOps operates more efficiently on GPUs. Therefore we automated the entire process with GPU.
Using Cirun we created a machine on AWS with NVIDIA T4 GPU and performed our operations. As a result we can see the huge difference between time complexities of MLOps using CPU and GPU.

References

If you have any questions and feedback about this blog, feel free to comment or reach out Cirun.io.

Top comments (0)