This article is part of a tutorial series on txtai, an AI-powered search engine.
This article covers the transcription of audio files to text using models provided by Hugging Face.
txtai and all dependencies.
pip install txtai # Get test data wget -N https://github.com/neuml/txtai/releases/download/v2.0.0/tests.tar.gz tar -xvzf tests.tar.gz
The Transcription instance is the main entrypoint for transcribing audio to text. The pipeline abstracts transcribing audio into a one line call!
The pipeline executes logic to read audio files into memory, run the data through a machine learning model and output the results to text.
from txtai.pipeline import Transcription # Create transcription model transcribe = Transcription("facebook/wav2vec2-large-960h")
The example below shows how to transcribe a list of audio files to text. Let's transcribe audio to text and look at each result.
from IPython.display import Audio, display files = ["Beijing_mobilises.wav", "Canadas_last_fully.wav", "Maine_man_wins_1_mil.wav", "Make_huge_profits.wav", "The_National_Park.wav", "US_tops_5_million.wav"] files = ["txtai/%s" % x for x in files] for x, text in enumerate(transcribe(files)): display(Audio(files[x])) print(text) print()
Baging mobilizes invasion craft along coast as tiwan tensions escalates Canada's last fully intact ice shelf has suddenly collapsed forming a manhatten sized ice berg Main man wins from lottery ticket Make huge profits without working make up to one hundred thousand dollars a day National park service warns against sacrificing slower friends in a bare attack U s virus cases top a million
Overall the results are solid. Each result sounds phonetically like the audio. There is an open task with the Hugging Face models to use a language model to decode the model outputs and further improve result accuracy.
Keep an eye out for those updated models!