DEV Community

Keisuke Sato
Keisuke Sato

Posted on • Edited on

Using MLflow on google colaboratory with github to build cosy environment: building

(Updated on 19, March 2022)
(Updated on 6, February 2022)
(Updated on 30, January 2022)

Introduction

I built my first cosy environment. The following is how I build it.

github repository: template_with_mlflow

Preparation

Here from, I suppose you've got accounts of google, ngrok, and github. If you haven't, please create them before starting to read the following.

You have to upload yaml file general_config.yaml including github and ngrok information like the following image.
Image description

It is written like the following.

github:
  username: your_username
  email: your_email@gmail.com
  token: your_personal_access_token
ngrok:
  token: ngrok_authentication_token
Enter fullscreen mode Exit fullscreen mode

If you haven't got any personal access token, you have to create it by following [Creating a personal access token]. You can find another token on your ngrok top page:
Image description

Process

I'll show how I built my cosy environment.

1: Create a new google colaboratory notebook
Image description

2: Install and import mlflow and pyngrok to visualize your model information running the following codes on the google colaboratory notebook.

!pip install mlflow
!pip install pyngrok

import os
from pyngrok import ngrok
import yaml
Enter fullscreen mode Exit fullscreen mode

3: Set your information running the following codes.

# Mount my google drive
from google.colab import drive
drive_path = "/content/gdrive"
drive.mount(drive_path)

# Load the general config
config_path = os.path.join(drive_path, "MyDrive", "config", "general_config.yaml")
with open(config_path, 'r') as yml:
  config = yaml.safe_load(yml)

config_github = config["github"]
config_ngrok = config["ngrok"]

# Set git configs
!git config --global user.email {config_github["email"]}
!git config --global user.name {config_github["username"]}

# Clone the repository
repository_name = "template_with_mlflow"
git_repository = f"https://github.com/ksk0629/" + repository_name + ".git"
repository_path = "/content/" + repository_name
!git clone {git_repository}

# Change the current directory to the cloned directory
%cd {repository_name}

# Checkout branch
branch_name = "main"
!git checkout {branch_name}

# Pull
!git pull
Enter fullscreen mode Exit fullscreen mode

You can replace "template_with_mlflow" with a repository name you want to clone.

4: Train your model containing MLflow codes like the following.

experiment_name = "mnist with cnn"
run_name = "first run"
validation_size = 0.2
epochs = 1000
batch_size = 2048
n_features = 784
n_hidden = 100
learning_rate = 0.01
seed = 57

!python ./src/mlflow_example.py "{experiment_name}" "{run_name}" {seed} {validation_size} {n_hidden} {n_features} {epochs} {batch_size} {learning_rate}
Enter fullscreen mode Exit fullscreen mode
experiment_name = "mnist with cnn"
run_name = "second run"
validation_size = 0.2
epochs = 1000
batch_size = 2048
n_features = 784
n_hidden = 300
learning_rate = 0.01
seed = 57

!python ./src/mlflow_example.py "{experiment_name}" "{run_name}" {seed} {validation_size} {n_hidden} {n_features} {epochs} {batch_size} {learning_rate}
Enter fullscreen mode Exit fullscreen mode

You can train models whatever you want.

5: Run MLflow and see your models' information through ngrok.

# Run mlflow
get_ipython().system_raw("mlflow ui --port 5000 &") # run tracking UI in the background

# Terminate open tunnels if exist
ngrok.kill()

# Setting the authtoken of ngrok
ngrok.set_auth_token(config_ngrok["token"])

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)
Enter fullscreen mode Exit fullscreen mode

You would get a global IP on the output cell:

MLflow Tracking UI: https://cexx-xx-xxx-xxx-xx.ngrok.io
Enter fullscreen mode Exit fullscreen mode

You can see your models' information on the page like the following.
Image description

6: Commit and push your changes to the remote repository.

add_objects = os.path.join(repository_path, "mlruns", "*")
!git add {add_objects}

commit_msg = "Add new mlruns"
!git commit -m "{commit_msg}"

html = f"https://{config_github['token']}@github.com/{config_github['username']}/{repository_name}.git"
!git remote set-url origin {html}
!git push origin {branch_name}
Enter fullscreen mode Exit fullscreen mode

Of course, you can choose files you commit and change the commit message to whatever you want to.

Conclusion

You've already got your cosy environment! By the way, I said

I have to add the commit number information to MLflow information after pushing new source codes on a remote repository.

but, apparently, MLflow was so smart. I didn't do anything, but the git commit number was already written!
Image description

Top comments (0)