DEV Community

Gardner Bickford
Gardner Bickford

Posted on

OpenAI Whisper Inference on Apple Silicon METAL GPU

OpenAI Whisper Inference on Apple Silicon METAL GPU

This code example shows inference running on the Apple Silicon GPU. It did not work with the homebrew python version. Creating a miniconda environment resolved this. To monitor GPU usage open Activity Monitor then click the Window menu and select GPU History.

Prepare the environment

The tensorflow metal plugin is compatible with TensorFlow version 2.13 or later. This may work with later versions of python. Please let me know the results of your experiements.

conda create -n tfmetal python=3.10
conda activate tfmetal
python3 -m pip install \
  tensorflow-metal \
  tensorflow \
  transformers \
  datasets \
  soundfile \
  librosa
Enter fullscreen mode Exit fullscreen mode

whisper.py

from transformers import WhisperProcessor, TFWhisperForConditionalGeneration
from datasets import load_dataset

# load model and processor
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v2")
model = TFWhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v2")
model.config.forced_decoder_ids = None

# load dummy dataset and read audio files
ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
sample = ds[0]["audio"]
input_features = processor(sample["array"], sampling_rate=sample["sampling_rate"], return_tensors="tf").input_features 

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=False)
print(transcription)
print("---")
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)
Enter fullscreen mode Exit fullscreen mode

Run the inference

python3 whisper.py
Enter fullscreen mode Exit fullscreen mode

Output


2023-07-14 11:08:58.333012: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2023-07-14 11:08:58.333059: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 24.00 GB
2023-07-14 11:08:58.333069: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 8.00 GB
2023-07-14 11:08:58.333338: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:303] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-07-14 11:08:58.333375: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:269] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
All model checkpoint layers were used when initializing TFWhisperForConditionalGeneration.

All the layers of TFWhisperForConditionalGeneration were initialized from the model checkpoint at openai/whisper-large-v2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFWhisperForConditionalGeneration for predictions without further training.
Found cached dataset librispeech_asr_dummy (/Users/gardner/.cache/huggingface/datasets/hf-internal-testing___librispeech_asr_dummy/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b)
/Users/gardner/miniconda3/envs/tfmetal/lib/python3.10/site-packages/transformers/generation/tf_utils.py:854: UserWarning: Using `max_length`'s default (448) to control the generation length. This behaviour is deprecated and will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
['<|startoftranscript|><|en|><|transcribe|><|notimestamps|> Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.<|endoftext|>']
---
[' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.']
Enter fullscreen mode Exit fullscreen mode

Top comments (0)