Arm has been working on Machine Learning frameworks to make AArch64 a primary architecture and bring the best possible performance to Arm servers, including AWS EC2 instances powered by Graviton processors. One of the important frameworks is TensorFlow.
TensorFlow has seen increased usage on Arm, ranging from smaller systems like the Raspberry Pi to larger systems for server and high-performance computing. Even though there is some support for AArch64 in packages already, users may want to compile everything from source. Reasons include using specific tools, targeting a different runtime environment, and experimenting with performance improvements from underlying libraries.
There is information in the Graviton2 getting started - which I highly recommend if you are using Graviton2, but this project provides more detail about the various ways to build and customize TensorFlow on AArch64.
Last week, the AWS public container registry became available and I thought this is a good project to try it out and share the container images using ECR Public. I also recommend the AWS News Blog for more info.
To get started let's build the TensorFlow container images. I’m going to build all of the images using a t4g.xlarge EC2 instance with Ubuntu 18.04.
Connect to the EC2 instance and make sure it is AArch64.
$ uname -m aarch64
Install Docker using the standard Linux install procedure:
$ sudo apt update $ sudo apt upgrade -y $ curl -fsSL get.docker.com -o get-docker.sh && sh get-docker.sh $ sudo usermod -aG docker ubuntu ; newgrp docker $ docker run hello-world
The project is in the Arm GitHub area. Clone the repository and change to the TensorFlow area:
$ git clone https://github.com/ARM-software/Tool-Solutions.git $ cd Tool-Solutions/docker/tensorflow-aarch64
The project has a five stage Dockerfile so incremental progress can be saved and reused as needed.
The build.sh script builds images and has a help flag to review the options. The build-type flag is used to specify a specific set of images to build.
$ ./build.sh -h
More info is available in the project README.
$ ./build.sh --onednn armpl --build-type full --jobs 16 --bazel_memory_limit 30000
Take a break here, the build will take multiple hours.
Login to the AWS console and navigate to the Elastic Container Registry (ECR). Create an ECR Public repository using the Create repository button on the Public tab. I won’t repeat all of the steps as the Getting started guide is excellent.
Initially, I had trouble with the aws cli when I tried:
$ aws ecr-public get-login-password --region us-east-1
The ecr-public command was introduced in version 2.1.6. Installing awscli via apt on EC2 with Ubuntu 18.04 didn’t install a new enough version.
To install the specific version on Graviton use:
$ sudo apt install unzip $ curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64-2.1.6.zip" -o "awscliv2.zip" $ unzip awscliv2.zip $ sudo ./aws/install
With this version of awscli the push instructions to ECR Public complete sucessfully. The image is visible now and can be quickly found by filtering on ARM 64 and searching for tensorflow.
Pull the image and run using the steps below.
Building on Graviton2 results in optimization for the Neoverse-N1 CPU. This provides the best performance on Graviton2, but TensorFlow will not run on EC2 A1 instances or other systems with Cortex-A72 and Cortex-A53. An "Illegal instruction" message will be printed. Change the optimization flags or build on A1 instances to solve this problem.
$ docker pull public.ecr.aws/z9p7l6s8/tensorflow2-aarch64:latest
Run the TensorFlow 2 quick start to test the image. Here is the quick start example which can be copied into a single text file.
import tensorflow as tf mnist = tf.keras.datasets.mnist (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dropout(0.2), tf.keras.layers.Dense(10) ]) predictions = model(x_train[:1]).numpy() predictions tf.nn.softmax(predictions).numpy() loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True) loss_fn(y_train[:1], predictions).numpy() model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy']) model.fit(x_train, y_train, epochs=5) model.evaluate(x_test, y_test, verbose=2) probability_model = tf.keras.Sequential([ model, tf.keras.layers.Softmax() ]) probability_model(x_test[:5])
Run the container using docker run:
$ docker run -it --rm public.ecr.aws/z9p7l6s8/tensorflow2-aarch64 /bin/bash
Open an editor and paste the quick start code into the file. I use vi, but any text editor works.
$ vi quickstart.py Paste in the python code above and save $ python3 ./quickstart.py Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz 11493376/11490434 [==============================] - 1s 0us/step 2020-12-08 22:58:00.386151: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: Epoch 1/5 1875/1875 [==============================] - 9s 5ms/step - loss: 0.2944 - accuracy: 0.9145 Epoch 2/5 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1439 - accuracy: 0.9567 Epoch 3/5 1875/1875 [==============================] - 7s 4ms/step - loss: 0.1069 - accuracy: 0.9676 Epoch 4/5 1875/1875 [==============================] - 7s 4ms/step - loss: 0.0874 - accuracy: 0.9733 Epoch 5/5 1875/1875 [==============================] - 7s 4ms/step - loss: 0.0741 - accuracy: 0.9765 313/313 - 0s - loss: 0.0777 - accuracy: 0.9774
ECR Public is a good way to share container images in AWS. The created image shows up under the Public tab in all AWS regions.