DEV Community

Julien Simon
Julien Simon

Posted on • Originally published at julsimon.Medium on

Demo: audio classification with the Audio Spectrogram Transformer

Multi-modal transformers are rising fast. A great example is the Audio Spectrogram Transformer, an audio classification model that was just added to the Hugging Face Transformers library. This model first creates a spectrogram image of an audio clip and then classifies the image with a Vision Transformer model. Amazing results!

✅ Spaces demo: https://huggingface.co/spaces/juliensimon/keyword-spotting

✅ Model: https://huggingface.co/MIT/ast-finetuned-speech-commands-v2

✅ Paper: https://arxiv.org/abs/2104.01778

Top comments (0)