Xavier Rey-Robert

Posted on Jul 23, 2020 • Edited on Jul 28, 2020

Machine learning on macOs using Keras -> Tensorflow (1.15.0) -> nGraph -> PlaidML -> AMD GPU

#keras #plaidml #ngraph #amd

Since the unavailability of Cuda on macOS, choices to use GPUs for Machine learning on Macs are sparse.

After failing to find some practical ways to do it, I resorted to use a second Linux computer with an Nvidia GPU for training my networks.

The availability of macOS Catalina with Apple support for Navi AMD GPUs incited me to give it another try. This was quite tough so I decided to write it down to share the experience.

The easy way: Keras with PlaidML - No tensorflow involved

This is quite straight forward and I'm not going to cover it again here. You can check this article here : https://medium.com/@bamouh42/gpu-acceleration-on-amd-with-plaidml-for-training-and-using-keras-models-57a9fce883b9

In my case that was not satisfying. Here Keras is using PlaidML as a backend and I want to be able to use Kapre which requires a tensorflow backend. Kapre is a neat library providing keras layers to calculate melspectrograms on the fly.

Be aware that " Keras team steping away from multi-backends " so the Keras -> PlaidML approach might be a dead end anyway.

The journey to Tensorflow execution on mac GPUs / eGPUs

The key element here is nGraph. Without entering into details, nGraph is pursuing a neutral approach in supporting multiple frameworks (Tensorflow, ONNX, etc.) and multiple hardware targets (Intel CPU, NNPs, etc) and luckily for us (not so! just wait) nGraph was also integrated with PlaidML to offer support for GPUs (Intel, Nvidia and... AMD).

So on paper all is great, we have a way to go:
Keras -> Tensorflow -> nGraph -> nGraph-bridge -> PlaidML -> Metal -> AMD GPU.

In this domain like others, things are moving fast. So fast that it's not allways easy to keep pace and for the teams of those projects it's the same. There are a lot of involved sofware and things are changing so fast that developpers don't have time - or take time - to settle things down.

nGraph-bridge team hasn't been doing proper releases since August 2019 (v0.18.1) and while they are still activily working on the project they seem to have been focusing on big refactoring.

To make things worse PlaidML support was (silently) dropped from nGraph in April without much explanations or warning so forget about using the latest github master to try to sort it out ! I spend hours wondering why it wasn't working when it was simply not there anymore.

Why was PlaidML bridge droped ?

It seems that the futur path to hapyness will be Keras -> Tensorflow -> Mlir -> PlaidMl -> ... and all are preping for the jump when Mlir as tensorflow backend will be released ... in 2021! but as of today users are just left hanging in midair.

What are your options ?

At time of writing the latest release is ngraph-bridge v0.18.1 (dated 20 Aug 2019!). It's using tensorflow v1.14.0 - Argh! Kapre requirement is tensorflow v1.15 - Dead end again.

I should mention that you should better not use prebuilt wheels. I realized not all are compiled with PlaidML backend support. So your best chance is to Build nGraph and nGraph-bridge from sources and you'd rather have all stars aligned for that to happend flawlessly. A lot of things can go wrong: Python versions, bazel versions, libraries incompatibilities, bugs to fix in the code etc... all joys of pythons

Picking a release candidate to build

v0.19.0-rc9 brings Tensorflow v1.15.0, nGraph 0.28.0-rc1 - the recommended last stable baseline - is Tensorflow v1.14.0

I need TF15 so let's try with v.0.19.0-rc10 then... of course standard build miserably crash which lead me to think that this rc was probably never compiled/tested with plaidml support on mac as clang fails because of a non complete switch statement in plaidml_translate.cpp

We will fix it by adding this line to the to the switch(dt) in the tile_converter function:

case PLAIDML_DATA_BFLOAT16: return "as_bfloat16(" + tensor_name + ", 16)";

See The complete build instructions bellow.

If everything goes right you should end up with something like this:

TensorFlow version:  1.15.0
C Compiler version used in building TensorFlow:  4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.11.45.5)
nGraph bridge version: b'0.19.0-rc10'
nGraph version used for this build: b'0.25.1-rc.10+90c70dd'
TensorFlow version used for this build: v1.15.0-rc3-22-g590d6eef7e
CXX11_ABI flag used for this build: 0
nGraph bridge built with Grappler: False
nGraph bridge built with Variables and Optimizers Enablement: False

Final thoughts - Use at your own risks

Ok, we have a working environment but they are so many imbricated (fresh) software bricks that we have no garantee that all this will run properly in all circumstances.
Using Kapre for exemple, I'm able to use the _mel_spectrogram_ layer just fine, but ngraph-bridge
will crash on a Caught exception while executing nGraph computation: syntax error when trying to use the STFT layer...

I will not abandon quite yet my linux deep learning work horse but at least I have an environment to try out that will use my Macbook pro GPU on the go and my Catalina / AMD RX 5700 XT setup at home.

The complete build instructions

I'm putting bellow what worked for me - I retested on a fresh mac after days of messing up -

Make sure you have a proper python3 installation (I wont cover it). I'm using 3.7 and using ‘‘‘brew install python@3.7 to manage it.‘‘‘

git clone https://github.com/tensorflow/ngraph-bridge.git
cd ngraph-bridge

git checkout v0.19.0-rc10

# Install bazel (bazelisk was a mess)
export BAZEL_VERSION=0.25.2 

curl -LO "https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-installer-darwin-x86_64.sh"

chmod +x "bazel-${BAZEL_VERSION}-installer-darwin-x86_64.sh"
./bazel-${BAZEL_VERSION}-installer-darwin-x86_64.sh --user

source ~/.bazel/bin/bazel-complete.bash

# Add $HOME/bin to your PATH in .zshrc (or .bashrc) and source it

echo "\nexport PATH=$PATH:$HOME/bin" >> ~/.zshrc
source ~/.zshrc

# check bazel 
bazel version

# I like to start with a fresh venv dedicated to the build

python3 -m venv build-venv
source build-venv/bin/activate

# Recommended virtualenv v16.0.0 didn't work, I ended up using latest version

python3 -m pip3 install virtualenv

#Install tensorflow from wheel (find the right one here: https://pypi.org/project/tensorflow/1.15.0/#files)

python3 -m pip install https://files.pythonhosted.org/packages/dc/65/a94519cd8b4fd61a7b002cb752bfc0c0e5faa25d1f43ec4f0a4705020126/tensorflow-1.15.0-cp37-cp37m-macosx_10_11_x86_64.whl

#start the build

python3 build_ngtf.py --use_prebuilt_tensorflow --build_plaidml_backend

# When the build fails edit plaidml_translate.cpp from ngraph to add the missing case 

vi /build_cmake/ngraph/src/ngraph/runtime/plaidml/plaidml_translate.cpp 

#re-start the build

python3 build_ngtf.py --use_prebuilt_tensorflow --build_plaidml_backend

Some hints for the records:

When installing Kapre you might run into

AttributeError: module 'enum' has no attribute 'IntFlag

This is solved by removing enum34:

enum34 1.1.10

When importing Librosa, you might run into:

ModuleNotFoundError: No module named 'numba.decorators

This is solved by using an older version of numba:

pip install numba==0.48

DEV Community