Posted on Feb 12, 2023 • Edited on Feb 13, 2023

Accelerate Machine Learning Local Development and Test Workflows with Nvidia Docker

#docker #machinelearning #python

Recently I have been studying current research in the field of Virtual Try On. To support learning and further exploration, many of these papers provide the code and models making it easy to reproduce their results. On the other hand, they all have specific requirements with regards to the library versions such as PyTorch / TensorFlow and any other packages used as well NVIDIA Drivers and System wide SDKs such as CUDA and cuDNN versions. Achieving a harmony between these requirements can take a lot of time and only one rogue update away from chaos.

The idea behind Python Virtual Environments is a well known and utilised concept. If using conda, it is even possible to select a specific Python version for your virtual environment which makes it easy to experiment with code that has dependencies to libraries such as TensorFlow 1.5.

Given many of the code requires a GPU to be able to run faster, sometimes it is necessary to use different versions of CUDA, cuDNN and related SDKs. Managing these can quickly turn into a challenge and might take more than the actual task in hand.

In this post, I will cover two approaches that utilise Docker to make it possible to run GPU code in a few minutes without having to modify system wide SDKs. Although the examples are specific to two repositories, this approach works with any code based that has dependencies on specific Nvidia CUDA versions and applicable to any libraries you may have as dependencies.

What the post will focus on

This post will demonstrate two approaches to utilising Nvidia Docker for local development and testing without having to install CUDA SDKs on the host operating system.

OpenPose command line interface (CLI) utilising CUDA (Every thing in Docker image.)
- This will contain all dependencies and the CLI itself including models in the image
CIHP_PGN repository for human part segmentation (Dependencies in Docker, code / models mounted at runtime)
- This will only contain the CUDA and related SDKs, TensorFlow and any Python dependencies in the image.

Prerequisites

A Linux based Operating System (I am using Ubuntu Desktop 22.04 LTS)
- WSL2 and Docker Desktop is also supported WSL 2 Support Constraints
An Nvidia GPU (I am using a GTX 1080Ti with 11Gb ram which is quite dated)
Docker (I am using Docker version 23.0.1, build a5ee5)
- Docker Compose is not used in this post but it can be handy if you adopt this approach.
NVIDIA Docker (NVIDIA Docker: 2.12.0)

Option 1: Building an image with all dependencies and the runtime environment

This approach is used to build a Docker image that contains everything we need including the tools and the data we are building. We can then run the container by mounting input and output directories to process our input images and retrieve the output files. We will use OpenPose repository of an example of this approach.

Dockerfile

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04
# https://hub.docker.com/r/nvidia/cuda

ENV DEBIAN_FRONTEND=noninteractive

# install the dependencies for building OpenPose
RUN apt-get update  &&  # The rest is ignored for brevity. 

RUN pip3 install --no-cache-dir # The rest is ignored for brevity.

# install cmake, clone OpenPose and download models
RUN wget https://cmake.org/files/v3.20/cmake-3.20.2-linux-x86_64.tar.gz && \  # The rest is ignored for brevity. 

WORKDIR /openpose/build
RUN alias python=python3 && cmake -DBUILD_PYTHON=OFF -DWITH_GTK=OFF -DUSE_CUDNN=ON  .. 

# Build OpenPose. Cudnn 8 causes memory issues this is why we are using base with CUDA 10 and Cudnn 7
# Fix for CUDA 10.0 and Cudnn 7 based on the post below.         
# https://github.com/CMU-Perceptual-Computing-Lab/openpose/issues/1753#issuecomment-792431838
RUN sed -ie 's/set(AMPERE "80 86")/#&/g'  ../cmake/Cuda.cmake && \
    sed -ie 's/set(AMPERE "80 86")/#&/g'  ../3rdparty/caffe/cmake/Cuda.cmake && \
    make -j`nproc` && \
    make install

WORKDIR /openpose

The necessary code is provided in the open-pose directory of the demo repository and works as following:

Build the image, run and verify the output

git clone https://github.com/syamaner/docker-cuda-demo
cd docker-cuda-demo/open-pose 

# Either run ./build.sh or the following to build:
docker build -t openpose:cuda10.0-cudnn7-devel ./build

# Once the image is build you can run pose estimation for images in /open-pose/data/input/image directory by either running ./run.sh or the following command:

docker run --gpus all \
                   -v ${PWD}/data/input/image:/data/in/image \
                   -v ${PWD}/data/output/json:/data/out/json \
                   -v ${PWD}/data/output/image:/data/out/image \
                   -v ${PWD}/data/detect-pose.sh:/data/detect-pose.sh \
                   -it openpose:cuda10.0-cudnn7-devel sh -c "chmod +x /data/detect-pose.sh && /data/detect-pose.sh"

# Provided the image was built successfully, the output files will be at {checkout root}/open-pose/data/output/{image and json directories}

As illustrated above, this approach provides a solution where everything we need including models inside the image. While sometimes this might be desirable, it also has some downsides as below:

Potential violation of licensing terms if you push the image to production repositories. Often prebuilt models have different licensing terms for allowed use.
- Same risks apply to the repositories used for evaluation.
Large docker images due to source code and models being bundled.
Having to rebuild image for updating the code and models.
Modifying code in your editor and running in the container is not as straightforward compared to next approach.

For the reasons above, the following option will provide us best of both worlds.

Option 2: Building an image to only contain required dependency versions.

In this approach, the model, code and data directories will be shared with the host instead and will not be included in the image itself.

Using this approach, we end up with purpose specific but also more general purpose images that could be shared across different project that have similar dependencies. As an example, we will use CIHP_PGN repository for human body part segmentation using the sample images used in this post.

CIHP_PGN requires TensorFlow 1.15.x and indirectly CUDA 10 and CUDNN 7.x and python 3. The Dockerfile will be simpler and look like below when we do not include the source code and models in the image.

FROM tensorflow/tensorflow:1.15.5-gpu-py3

# Handle Nvidia public key update and update repositories for Ubuntu 18.x.
#https://github.com/sangyun884/HR-VITON/issues/45
# reference: https://jdhao.github.io/2022/05/05/nvidia-apt-repo-public-key-error-fix/
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-key del 7fa2af80

# Additional reference: https://gitlab.com/nvidia/container-images/cuda/-/issues/158
RUN export this_distro="$(cat /etc/os-release | grep '^ID=' | awk -F'=' '{print $2}')" \
    && export this_version="$(cat /etc/os-release | grep '^VERSION_ID=' | awk -F'=' '{print $2}' | sed 's/[^0-9]*//g')" \
    && apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/cuda/repos/${this_distro}${this_version}/x86_64/3bf863cc.pub" \
    && apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/machine-learning/repos/${this_distro}${this_version}/x86_64/7fa2af80.pub"

# get the latest version of OpenCV        
RUN apt-get update &&  \
        DEBIAN_FRONTEND=noninteractive \
        apt-get install -y -qq \
        wget git libopencv-dev

RUN python -m pip install --upgrade pip && \
    pip install matplotlib opencv-python==4.5.4.60 Pillow scipy \
    azure-eventhub azure-eventhub-checkpointstoreblob-aio ipykernel

WORKDIR /

Above is a Dockerfile that contains all the dependencies needed by CIHP_PGN repository. Unlike previous example, we will be downloading models and the repository in our host machine and mount these directories into the container.

Build the image, run and verify the output

cd {checkout_directory}/cihp_pgn

# Either run ./build.sh or the following to build:
docker build -t tensorflow:1.15.5-gpu-cihp-dependencies ./build

# Once the image is built, you can run segmentation on the input images in cihp_pgn/data/input/image directory by either running ./run.sh or the following command:

docker run --gpus all \
        -v ${PWD}/src:/pgn \
        -v ${PWD}/data/input/image:/data/input-image \
        -v ${PWD}/data/output/image:/data/output-image \
        -v ${PWD}/data/test.sh:/data/test.sh \
        -v ${PWD}/data/initialise.sh:/data/initialise.sh \
        -it tensorflow:1.15.5-gpu-cihp-dependencies sh -c \
              "chmod +x /data/initialise.sh && /data/initialise.sh && \
                chmod +x /data/test.sh && /data/test.sh"

The first time this command is run, it will take longer as the following will happen (initialise.sh):

The repository is cloned into cihp_pgn/src directory
The models will be downloaded to cihp_pgn/src/data-extraction and then copied into checkpoints directory under cihp_pgn/src.
- As these are persisted in host drive, this will only happen in first run and even if we delete and run container again as long as cihp_pgn/src contains the data, they will not be downloaded again.
Once initialisation completed, test.sh script will run segmentation on the source images and then will copy the outputs into cihp_pgn/data/output

The directory structure will look as the following and of course contents of cihp_pgn/src ignored by .gitignore.

The benefit of this approach is you can use your favourite IDE or code editor and run inference or training tasks in the container. As you are mounting your scripts and code from host into the container, this provides more flexibility in terms of making changes and testing.

And as Docker used the Linux kernel to manage resources in Linux, utilisation of specific hardware such as GPUs or TPUs is straightforward as documented above and allows getting more out of the hardware without having to deal with SDK versioning in the host operating system.

I have also been successful testing the sample code on Windows 11 with WSL2 (Debian), Docker Desktop and NVIDIA Docker and included the general setup links which worked for me below. Although I have not compared the performance, the speed seemed faster on Ubuntu Desktop.

Docker Hub links

nvidia/cuda
- If you need to start with a specific version of CUDA / cuDNN
pytorch/pytorch
- If you need specific version of PyTorch / CUDA / cuDNN combination to begin with.
tensorflow/tensorflow
- If you need specific version of TensorFlow / CUDA / cuDNN combination to begin with.

DEV Community

Accelerate Machine Learning Local Development and Test Workflows with Nvidia Docker

What the post will focus on

Prerequisites

Option 1: Building an image with all dependencies and the runtime environment

Build the image, run and verify the output

Option 2: Building an image to only contain required dependency versions.

Build the image, run and verify the output

Docker Hub links

Relevant Repositories

Links

Windows and WSL2 Based Links

Top comments (0)

Read next

# Boost Your Python Tasks with `ThreadPoolExecutor`

CRUD With Flask And MySql #2 Prepare

AI Models Now Learn How to Think by Understanding Their Own Reasoning Process

Faiss with sqlite for RAG