Using the AWS Public Container Registry for TensorFlow 2 on Graviton Processors

#aws

Arm has been working on Machine Learning frameworks to make AArch64 a primary architecture and bring the best possible performance to Arm servers, including AWS EC2 instances powered by Graviton processors. One of the important frameworks is TensorFlow.

TensorFlow has seen increased usage on Arm, ranging from smaller systems like the Raspberry Pi to larger systems for server and high-performance computing. Even though there is some support for AArch64 in packages already, users may want to compile everything from source. Reasons include using specific tools, targeting a different runtime environment, and experimenting with performance improvements from underlying libraries.

There is information in the Graviton2 getting started - which I highly recommend if you are using Graviton2, but this project provides more detail about the various ways to build and customize TensorFlow on AArch64.

Last week, the AWS public container registry became available and I thought this is a good project to try it out and share the container images using ECR Public. I also recommend the AWS News Blog for more info.

Build the TensorFlow images

To get started let's build the TensorFlow container images. I’m going to build all of the images using a t4g.xlarge EC2 instance with Ubuntu 18.04.

Connect to the EC2 instance and make sure it is AArch64.

$ uname -m
aarch64

Install Docker using the standard Linux install procedure:

$ sudo apt update
$ sudo apt upgrade -y
$ curl -fsSL get.docker.com -o get-docker.sh && sh get-docker.sh
$ sudo usermod -aG docker ubuntu ; newgrp docker
$ docker run hello-world

The project is in the Arm GitHub area. Clone the repository and change to the TensorFlow area:

$ git clone https://github.com/ARM-software/Tool-Solutions.git
$ cd Tool-Solutions/docker/tensorflow-aarch64

The project has a five stage Dockerfile so incremental progress can be saved and reused as needed.

The build.sh script builds images and has a help flag to review the options. The build-type flag is used to specify a specific set of images to build.

$ ./build.sh -h

More info is available in the project README.

To build a TensorFlow 2 image with Arm Performance Libraries and oneDNN.

$ ./build.sh  --onednn armpl --build-type full --jobs 16 --bazel_memory_limit 30000

Take a break here, the build will take multiple hours.

Push the images to ECR Public

Login to the AWS console and navigate to the Elastic Container Registry (ECR). Create an ECR Public repository using the Create repository button on the Public tab. I won’t repeat all of the steps as the Getting started guide is excellent.

Initially, I had trouble with the aws cli when I tried:

$ aws ecr-public get-login-password --region us-east-1

The ecr-public command was introduced in version 2.1.6. Installing awscli via apt on EC2 with Ubuntu 18.04 didn’t install a new enough version.

To install the specific version on Graviton use:

$ sudo apt install unzip
$ curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64-2.1.6.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

With this version of awscli the push instructions to ECR Public complete sucessfully. The image is visible now and can be quickly found by filtering on ARM 64 and searching for tensorflow.

Pull and run the final TensorFlow 2 image

Pull the image and run using the steps below.

Building on Graviton2 results in optimization for the Neoverse-N1 CPU. This provides the best performance on Graviton2, but TensorFlow will not run on EC2 A1 instances or other systems with Cortex-A72 and Cortex-A53. An "Illegal instruction" message will be printed. Change the optimization flags or build on A1 instances to solve this problem.

$ docker pull public.ecr.aws/z9p7l6s8/tensorflow2-aarch64:latest

Run the TensorFlow 2 quick start to test the image. Here is the quick start example which can be copied into a single text file.

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

predictions = model(x_train[:1]).numpy()
predictions

tf.nn.softmax(predictions).numpy()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

loss_fn(y_train[:1], predictions).numpy()

model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)
model.evaluate(x_test,  y_test, verbose=2)
probability_model = tf.keras.Sequential([
  model,
  tf.keras.layers.Softmax()
])

probability_model(x_test[:5])

Run the container using docker run:

$ docker run -it --rm public.ecr.aws/z9p7l6s8/tensorflow2-aarch64 /bin/bash

Open an editor and paste the quick start code into the file. I use vi, but any text editor works.

$ vi quickstart.py
Paste in the python code above and save

$ python3 ./quickstart.py
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 1s 0us/step
2020-12-08 22:58:00.386151: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting:
Epoch 1/5
1875/1875 [==============================] - 9s 5ms/step - loss: 0.2944 - accuracy: 0.9145
Epoch 2/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.1439 - accuracy: 0.9567
Epoch 3/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.1069 - accuracy: 0.9676
Epoch 4/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0874 - accuracy: 0.9733
Epoch 5/5
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0741 - accuracy: 0.9765
313/313 - 0s - loss: 0.0777 - accuracy: 0.9774

Conclusion

ECR Public is a good way to share container images in AWS. The created image shows up under the Public tab in all AWS regions.

The video below is from Arm DevSummit 2020 and provides more info about TensorFlow and PyTorch on AArch64. More DevSummit material is available on the Arm Software Developers channel.