DEV Community

Akshay Milmile
Akshay Milmile

Posted on

Downloading MLFlow model from Databricks and Serving with Docker

If you are using Managed MLFlow in Databricks Workspace to train and save your models and can't figure out how to download and serve the model outside the Databricks environment using Docker, you are in luck!

In this article, I will touch upon the following points:

  • Downloading MLFlow model from Databricks workspace model registry
  • Packaging downloaded model and serve it in a container using Docker

Downloading MLFlow model from Databricks workspace

Databricks provides the managed version of MLFlow to write our experiments in a notebook and register the model in the provided MLFlow registry.

We'll use MLFlow's Python API to download a model.

To download a model from Databricks workspace you need to do two things:

  1. Set MLFlow tracking URI to databricks using python API
  2. Setup databricks authentication. I prefer authenticating by setting the following environment variables, you can also use databricks CLI to authenticate:
DATABRICKS_HOST
DATABRICKS_TOKEN
Enter fullscreen mode Exit fullscreen mode

Here's a basic code snippet to download a model from Databricks workspace model registry:

import os
import mlflow
from mlflow.store.artifact.models_artifact_repo import ModelsArtifactRepository

model_name = "example-model-name"
model_stage = "Staging"  # Should be either 'Staging' or 'Production'

mlflow.set_tracking_uri("databricks")

os.makedirs("model", exist_ok=True)
local_path = ModelsArtifactRepository(
    f'models:/{model_name}/{model_stage}').download_artifacts("", dst_path="model")

print(f'{model_stage} Model {model_name} is downloaded at {local_path}')
Enter fullscreen mode Exit fullscreen mode

Running above python script will download an ML model in the model directory.

Containerizing MLFlow model serving with Docker

The next step is to package this downloaded model in a docker image and serve a model when you run the image.

Here's a basic Dockerfile to do the same:

FROM continuumio/miniconda3

ENV MLFLOW_HOME /opt/mlflow
ENV MLFLOW_VERSION 1.12.1
ENV PORT 5000

RUN conda install -c conda-forge mlflow=${MLFLOW_VERSION}

COPY model/ ${MLFLOW_HOME}/model

WORKDIR ${MLFLOW_HOME}

RUN mlflow models prepare-env -m ${MLFLOW_HOME}/model

RUN useradd -d ${MLFLOW_HOME} mlflow
RUN chown mlflow: ${MLFLOW_HOME}
USER mlflow

CMD mlflow models serve -m ${MLFLOW_HOME}/model --host 0.0.0.0 --port ${PORT}
Enter fullscreen mode Exit fullscreen mode

Few things to note from Dockerfile:

  • We are using base image continuumio/miniconda3 because mlflow by default uses conda to install it's dependencies while preparing environment for the model serving
  • We are using non-root mlflow user with limited permissions to run mlflow models serve process in a secure way
  • We are specifying host as 0.0.0.0 to listen on as default uses 127.0.0.1 which will not let us access webserver started by mlflow from outside container

Wrapping up

This is a very simple and minimal example of how you can use docker to serve the MLFlow model trained in Databricks workspace.
I hope this article will help you save time in connecting the dots in different documentations.
So that is it, fellas. Thank you for reading the article. Wish you a great day. Peace out ✌.

Reference

https://www.mlflow.org/docs/latest/cli.html#mlflow-models-serve
https://databricks.com/blog/2019/10/17/managed-mlflow-now-available-on-databricks-community-edition.html
https://docs.databricks.com/dev-tools/cli/index.html#set-up-authentication

Top comments (0)