I've been working on CML project in the last few months. The project idea is to automate machine learning projects using CI/CD practices:
- 📊 Visual reports in GitHub Pull Request or GitLab Merge Requests.
- 💾 Transfer datasets in your CI runners for ML training.
- ☁️ Auto-allocation of cloud CPU/GPU. AWS, Azure, GCP, Ali are supported.
What is CML? Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.
On every pull request, CML helps you automatically train and evaluate models, then generates a visual report with results and metrics. Above, an example report for a neural style transfer model.
We built CML with these principles in mind:
- GitFlow for data science. Use GitLab or GitHub to manage ML experiments, track who trained ML models or modified data and when. Codify data and models with DVC instead of pushing to a Git repo.
- Auto reports for ML experiments. Auto-generate reports with metrics and plots in each Git Pull Request. Rigorous engineering practices help your team make informed, data-driven decisions.
- No additional…
Today CML supports two CI/CD systems:
Automated visual ML report
You can set up auto-generated reports in your GitHub Pull Requests (or GitLab Merge Requests):
The report is generated by CML commands (cml-
prefix) from GitHub Actions scripts (or GitLab CI/CD script). GitHub Action example:
# Creat a file `.github/workflows/cml.yaml`
name: train-my-model
on: [push]
jobs:
run:
runs-on: [ubuntu-latest]
container: docker://dvcorg/cml-py3:latest
steps:
- uses: actions/checkout@v2
- name: cml_run
env:
repo_token: ${{ secrets.GITHUB_TOKEN }}
run: |
pip3 install -r requirements.txt
python train.py
cat metrics.txt >> report.md
cml-publish confusion_matrix.png --md >> report.md
cml-send-comment report.md
After pushing your code changes in GitHub the workflow code runs and generates the report as a comment in Pull Request:
$ vi train.py
$ git add train.py
$ git commit -m 'Increase depth to 7'
$ git push
Auto-allocate GPU and transfer datasets
You can find GPU examples and data transferring examples in the website http://cml.dev/
Technical details
The code is written in JavaScript: https://www.npmjs.com/package/@dvcorg/cml
And packed to a docker image that was used from the workflow: https://hub.docker.com/repository/docker/dvcorg/cml-py3
Conclusion
I'd love to hear your feedback and what next you'd like to automate in your ML projects.
Top comments (2)
Neat. More docker packaging! Why can't we all share essential Docker images instead of command line tools hosted on npm.
Good point! We should do docker more :)