How to Deploy ML Models Using Gravity AI and Meadowrun

#mlops #python #docker

Transform your containerized models-as-services into batch jobs

GravityAI is a marketplace for ML models where data scientists can publish their models as containers, and consumers can subscribe to access those models. In this article, we’ll talk about different options for deploying these containers and then walk through using them for batch jobs via Meadowrun, which is an open-source library for running your Python code and containers in the cloud.

A library inside of a container. Photo by Manuel Palmeira on Unsplash

Containers vs libraries

This section lays out some motivation for this post — if you’re interested in just getting things working, please go to the next section!

If you think of an ML model as a library, then it might seem more natural to publish it as a package, either on PyPI for use with pip or Anaconda.org for use with conda, rather than a container. Hugging Face’s transformers is a good example — you run pip install transformers, then your Python interpreter can do things like:

from transformers import pipeline
classifier = pipeline("sentiment-analysis")
classifier("We are very happy to show you the 🤗 Transformers library.")

There are a few disadvantages with this approach:

For pip specifically (not conda), there’s no way for packages to express non-Python dependencies. With the transformers library, you’ll usually depend on something like CUDA for GPU support, but users need to know to install that separately.
The transformers library also needs to download gigabytes of model weights in order to do anything useful. These get cached in a local directory so they don’t need to be downloaded every time you run, but for some contexts you might want to make these model weights a fixed part of the deployment.
Finally, when publishing a pip or conda package, users expect you to specify your dependencies relatively flexibly. E.g. the transformers package specifies “tensorflow>=2.3” which declares that the package works with any version of tensorflow 2.3 or higher. This means that if the tensorflow maintainers introduce a bug or break backwards compatibility, they can effectively cause your package to stop working (welcome to dependency hell). Flexible dependencies are useful for libraries, because it means more people can install your package into their environments. But for deployments, e.g. if you’re deploying a model within your company, you’re not taking advantage of that flexibility and you’d rather have the certainty it’s going to work every time, and reproducibility of the deployment.

One common way to solve these problems is to build a container. You can install CUDA into your container, if you include your model weights in the image Docker will make sure you only have a single copy of that data on each machine as long as you’re managing your Docker layers correctly, and you’ll be able to choose and test the specific version of tensorflow that goes into the container.

So we’ve solved a bunch of problems, but we’ve traded them for a new problem, which is that each container runs in its own little world and we have to do some work to expose our API to our user. With a library, the consumer can just call a Python function, pass in some inputs and get back some outputs. With a container, our model is more like an app or service, so there’s no built-in way for a consumer to “call” the container with some input data and get back some output data.

One option is to create a command line interface, but that requires explicitly binding input and output files to the container. This feels a bit unnatural, but we can see an example of it in this container image of ImageMagick by dpokidov. In the “Usage” section, the author recommends binding a local folder into the container in order to run it as a command line app.

The traditional answer to this question for Docker images is to expose an HTTP-based API, which is what Gravity AI’s containers do. But this means turning our function (e.g. classifier()) into a service, which means we need to figure out where to put this service. To give the traditional answer again, we could deploy it to Kubernetes with an autoscaler and a load balancer in front of it, which can work well if e.g. you have a constant stream of processes that need to call classifier(). But you might instead have a use case where some data shows up from a vendor every few hours which triggers a batch processing job. In that case, things can get a bit weird. You might be in a situation where the batch job is running, calls classifier(), then has to wait for the autoscaler/Kubernetes to find a free machine that can spin up the service while the machine running the batch job is sitting idle.

In other words, both of these options (library vs service) are reasonable, but they come with their own disadvantages.

As a bit of an aside, you could imagine a way to get the best of both worlds with an extension to Docker that would allow you to publish a container that exposes a Python API, so that someone could call sentiment = call_container_api(image="huggingface/transformers", "my input text") directly from their python code. This would effectively be a remote procedure call into a container that is not running as a service but instead spun up just for the purpose of executing a function on-demand. This feels like a really heavyweight approach to solving dependency hell, but if your libraries are using a cross-platform memory format (hello Apache Arrow!) under the covers, you could imagine doing some fun tricks like giving the container a read-only view into the caller’s memory space to reduce the overhead. It’s a bit implausible, but sometimes it’s helpful to sketch out these ideas to clarify the tradeoffs we’re making with the more practical bits of technology we have available.

Using models-in-containers locally

In this article, we’ll tackle this batch jobs-with-containers scenario. To make this concrete, let’s say that every morning, a vendor gives us a relatively large dump of everything everyone has said on the internet (Twitter, Reddit, Seeking Alpha, etc.) about the companies in the S&P 500 overnight. We want to feed these pieces of text into FinBERT, which is a version of BERT that has been fine-tuned for financial sentiment analysis. BERT is a language model from Google that was state of the art when it was published in 2018.

We’ll be using Gravity AI’s container for FinBERT, but we’ll also assume that we operate in a primarily batch process-oriented environment, so we don’t have e.g. a Kubernetes cluster set up and even if we did it would probably be tricky to get the autoscaling right because of our usage pattern.

If we’re just trying this out on our local machine, it’s pretty straightforward to use, as per Gravity AI’s documentation:

docker load -i Sentiment_Analysis_o_77f77f.docker.tar.gz
docker run -d -p 7000:80 gx-images:t-39c447b9e5b94d7ab75060d0a927807f

Sentiment_Analysis_o_77f77f.docker.tar.gz is the name of the file you download from Gravity AI and gx-images:t-39c447b9e5b94d7ab75060d0a927807f is the name of the Docker image once it’s loaded from the .tar.gz file, which will show up in the output of the first command.

And then we can write a bit of glue code:

import time

import requests


def upload_license_file(base_url: str) -> None:
    with open("Sentiment Analysis on Financial Text.gravity-ai.key", "r") as f:
        response = requests.post(f"{base_url}/api/license/file", files={"License": f})
        response.raise_for_status()


def call_finbert_container(base_url: str, input_text: str) -> str:
    add_job_response = requests.post(
        f"{base_url}/data/add-job",
        files={
            "CallbackUrl": (None, ""),
            "File": ("temp.txt", input_text, "text/plain"),
            "MimeType": (None, "text/plain"),
        }
    )
    add_job_response.raise_for_status()
    add_job_response_json = add_job_response.json()
    if (
        add_job_response_json.get("isError", False)
        or add_job_response_json.get("errorMessage") is not None
    ):
        raise ValueError(f"Error from server: {add_job_response_json.get('errorMessage')}")
    job_id = add_job_response_json["data"]["id"]
    job_status = add_job_response_json["data"]["status"]

    while job_status != "Complete":
        status_response = requests.get(f"{base_url}/data/status/{job_id}")
        status_response.raise_for_status()
        job_status = status_response.json()["data"]["status"]
        time.sleep(1)

    result_response = requests.get(f"{base_url}/data/result/{job_id}")
    return result_response.text


def process_data():
    base_url = "http://localhost:7000"

    upload_license_file(base_url)

    # Pretend to query input data from somewhere
    sample_data = [
        "Finnish media group Talentum has issued a profit warning",
        "The loss for the third quarter of 2007 was EUR 0.3 mn smaller than the loss of"
        " the second quarter of 2007"
    ]

    results = [call_finbert_container(base_url, line) for line in sample_data]

    # Pretend to store the output data somewhere
    print(results)


if __name__ == "__main__":
    process_data()

call_finbert_container calls the REST API provided by the container to submit a job, polls for the job completion, and then returns the job result. process_data pretends to get some text data and processes it using our container, and then pretends to write the output somewhere else. We’re also assuming you’ve downloaded the Gravity AI key to the current directory as Sentiment Analysis on Financial Text.gravity-ai.key.

Using models-in-containers on the cloud

This works great for playing around with our model locally, but at some point we’ll probably want to run this on the cloud, either to access additional compute whether that’s CPU or GPU, or to run this as a scheduled job with e.g. Airflow. The usual thing to do is package up our code (finbert_local_example.py and its dependencies) as a container, which means now we have two containers — one containing our glue code, and the FinBERT container that we need to launch together and coordinate (i.e. our glue code container needs to know the address/name of the FinBERT container to access it). We might start reaching for Docker Compose which works great for long-running services, but in the context of an ad-hoc distributed batch job or a scheduled job, it will be tricky to work with.

Instead, we’ll use Meadowrun to do most of the heavy lifting. Meadowrun will not only take care of the usual difficulties of allocating instances, deploying our code, etc., but also help launch an extra container and make it available to our code.

To follow along, you’ll need to set up an environment. We’ll show how this works with pip on Windows, but you should be able to follow along the package manager of your choice (conda environments don’t work across platforms, so conda will only work if you’re on Linux).

python -m venv venv
venv\Scripts\activate.bat
pip install requests meadowrun
meadowrun-manage-ec2 install

This creates a new virtaulenv, adds requests and meadowrun, and then installs Meadowrun into your AWS account.

When you download a container from Gravity AI, it comes as a .tar.gz file that needs to get uploaded to a container registry in order to work. There are slightly longer instructions in the Gravity AI documentation, but here’s a short version of how to create an ECR (Elastic Container Registry) repository then upload a container from Gravity AI to it:

aws ecr create-repository --repository-name mygravityaiaws ecr get-login-password | docker login --username AWS --password-stdin 012345678901.dkr.ecr.us-east-2.amazonaws.comdocker tag gx-images:t-39c447b9e5b94d7ab75060d0a927807f 012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai:finbertdocker push 012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai:finbert

012345678901 appears in a few places in this snippet and it needs to be replaced with your account id. You’ll see your account id in the output of the first command as registryId.

One more step before we can run some code: we’ll need to give the Meadowrun role permissions to this ECR repository.

meadowrun-manage-ec2 grant-permission-to-ecr-repo mygravityai

Now we can run our code:

import asyncio

import meadowrun

from finbert_local_example import upload_license_file, call_finbert_container

# Normally this would live in S3 or a database somewhere
_SAMPLE_DATA = {
    "Company1": [
        "Commission income fell to EUR 4.6 mn from EUR 5.1 mn in the corresponding "
        "period in 2007",
        "The purchase price will be paid in cash upon the closure of the transaction, "
        "scheduled for April 1 , 2009"
    ],
    "Company2": [
        "The loss for the third quarter of 2007 was EUR 0.3 mn smaller than the loss of"
        " the second quarter of 2007",
        "Consolidated operating profit excluding one-off items was EUR 30.6 mn, up from"
        " EUR 29.6 mn a year earlier"
    ]
}


def process_one_company(company_name: str):
    base_url = "http://container-service-0"

    data_for_company = _SAMPLE_DATA[company_name]

    upload_license_file("http://container-service-0")

    results = [call_finbert_container(base_url, line) for line in data_for_company]

    return results


async def process_data():
    results = await meadowrun.run_map(
        process_one_company,
        ["Company1", "Company2"],
        meadowrun.AllocCloudInstance("EC2"),
        meadowrun.Resources(logical_cpu=1, memory_gb=1.5, max_eviction_rate=80),
        await meadowrun.Deployment.mirror_local(working_directory_globs=["*.key"]),
        sidecar_containers=meadowrun.ContainerInterpreter(
            "012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai", "finbert"
        ),
    )
    print(results)


if __name__ == "__main__":
    asyncio.run(process_data())

Let’s walk through the process_data function line by line.

Meadowrun’s run_map function runs the specified function (in this case process_one_company) in parallel on the cloud. In this case, we provide two arguments (["Company1", "Company2"]) so we’ll run these two tasks in parallel. The idea is that we’re splitting up the workload so that we can finish the job quickly.
AllocCloudInstance tells Meadowrun to launch an EC2 instance to run this job if we don’t have one running already and Resources tells Meadowrun what resources are needed to run this code. In this case we’re requesting 1 CPU and 1.5 GB of RAM per task. We’re also specifying that we we’re okay with spot instances up to an 80% eviction rate (aka probability of interruption).
mirror_local tells Meadowrun that we want to use the code in the current directory, which is important as we’re reusing some code from finbert_local_example.py. Meadowrun only uploads .py files by default, but in our case, we need to include the .key file in our current working directory so that we can apply the Gravity AI license.
Finally, container_services tells Meadowrun to launch a container with the specified image for every task we have running in parallel. Each task can access its associated container as container-service-0, which you can see in the code for process_one_company. If you’re following along, you’ll need to edit the account id again to match your account id.

Let’s look at a few of the more important lines from the output:

Launched 1 new instance(s) (total $0.0209/hr) for the remaining 2 workers:
 ec2-13-59-48-22.us-east-2.compute.amazonaws.com: r5d.large (2.0 CPU, 16.0 GB), spot ($0.0209/hr, 2.5% chance of interruption), will run 2 workers

Here Meadowrun is telling us that it’s launching the cheapest EC2 instance that can run our job. In this case, we’re only paying 2¢ per hour!

Next, Meadowrun will replicate our local environment on the EC2 instance by building a new container image, and then also pull the FinBERT container that we specified:

Building python environment in container  4e4e2c...
...
Pulling docker image 012345678901.dkr.ecr.us-east-2.amazonaws.com/mygravityai:finbert

Our final result will be some output from the FinBERT model that looks like:

[['sentence,logit,prediction,sentiment_score\nCommission income fell to EUR 4.6 mn from EUR 5.1 mn in the corresponding period in 2007,[0.24862055 0.44351986 0.30785954],negative,-0.1948993\n',
...

Closing remarks

Gravity AI packages up ML models as containers which are really easy to use as services. Making these containers work naturally for batch jobs takes a bit of work, but Meadowrun makes it easy!