Live Audio Transcription with Python For Free

#python #machinelearning #programming #challenge

Unlike cloud speech-to-text APIs, Cheetah Streaming Speech-to-Text processes speech data locally on-device. Thus, it has a default advantage over the cloud speech-to-text APIs when it comes to speed. No cloud speech-to-text API can eliminate the network latency. They can be fast, but to a certain degree.

Let's learn to convert live audio to text using Picovoice Cheetah Streaming Speech-to-Text Python SDK, so you can see it yourself.

1. Install Cheetah Streaming Speech-to-Text Python SDK

pip install pvcheetah

2. Grab your AccessKey from **Picovoice Console**
If you do not have an existing Picovoice Console Account, create one in minutes. No credit card is required. You can enjoy the Forever-Free Plan, as the name suggests, forever!

3. Import Cheetah Streaming Speech-to-Text package:

import pvcheetah

4. Create an instance of the speech-to-text object with your AccessKey:

handle = pvcheetah.create(access_key)

Don't forget to replace the placeholder with your AccessKey

5. Implement audio recording.
Cheetah Streaming Speech-to-Text processes audio whether it comes from a microphone or another program.

For the following, we assume there is a function available to us that provides the next available audio chunk (frame) as below.

def get_next_audio_frame():
    pass

Convert live audio to text:

while True:
    partial_transcript, is_endpoint = handle.process(get_next_audio_frame())
    if is_endpoint:
        final_transcript = handle.flush()

That's it! In 5 simple steps, you can get live audio converted into text!

For more information, you can check

DEV Community

Live Audio Transcription with Python For Free

Top comments (0)

Read next

Types: char and boolean

Introduction to Git: A Powerful Version Control System

Static Keyword in C# Explained in 1 Minute

My 2024 Journey: Learning from My Mistakes as a Junior Dev