Creating a Speech Recognition Program with Python & Google API

#python #googlecloud #machinelearning

Speech Recognition means that the program will capture the words produced by a person and converts them into written words. It can be handy to generate subtitles, transcript a meeting discussion, and many other use cases.

Converting speech to text is quite a complex machine learning problem where an algorithm needs to receive every sound produced by a person and identify the corresponding written letters. Plus, depending on the language used, different sounds might correspond to other characters. As a result, speech recognition is too complex to be solved using a traditional programming approach.

Fortunately, big companies like Google, Amazon, IBM, and others have already solved this problem. They collected many audios, fed this data to algorithms using machine learning techniques, and produced trained algorithms to convert speech to text with really high accuracy. Plus, these algorithms are available through API's to easily integrate them into your programs.

This article will show you how using Python, and the Google API can transcribe audio with a few code lines. Let's get started!

Python Speech Recognition using Google Api

Google offers a Speech-To-Text service through an API, meaning that you can send a request with an audio file, and you will receive the transcription of the audio file. This service makes simple, including python speech recognition functionality in your programs.
See how to set-up a google account and configure it to access Google Speech-To-Text API

Write the python program

Once you have done all configuration needed to use the google speech-to-text API, you can move on to the last step, write the python program.

Our program will need the third-party library google-cloud-speech, which will send requests to Google. You can install this library running the following command from your terminal:

>> pip install --upgrade google-cloud-speech

Lastly, you can copy the code below and save it as a python script. Please note the audio file should be in the same folder as the script. Also, you will need to replace the file name test.wav with your file name.

from google.cloud import speech
import os
import io

# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test.wav")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=2,
    language_code="en-US",
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

# Reads the response
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

In case your file has a different extension, you can convert it using an online file converter. Go to m4a to wav converter.

If your program is working correctly, this is the output you will see after executing your script:

>> python speech_to_text.py # Replace with your program file name

Output:

Transcript: hey there in this area you will learn how you can set your django version there are a few ways
Transcript:  there are a few ways to check your django version and in this video I will show you a few of them I will also show you how you can upgrade and downgrade your django version

Any errors?See some possible errors and how to fix them.

Hope you enjoy his tutorial and thank you so much for reading! Happy coding!