DEV Community

Cover image for Day 4: Voice Activity Detection with Python
Dilek Karasoy for Picovoice

Posted on

Day 4: Voice Activity Detection with Python

When it comes to speech recognition, most only know about Automatic Speech Recognition (ASR). Voice Activity Detection (VAD) is an important and fundamental piece in any product related to speech. Voice AI vendors integrate VAD into their ASRs but do not offer it separately. Picovoice also built Cobra for internal use initially. Then make it public due to the market demand as there is no alternative to Google’s WebRTC VAD, which does not work on all platforms.

You can read more on what voice activity detection is, but today is the day to learn how to detect voice activity with Cobra VAD Python SDK:

1. Install VAD SDK

pip3 install pvcobra
Enter fullscreen mode Exit fullscreen mode

Sign up for Picovoice Console if you haven't already done (it's free) to grab your AccessKey.

2. Implement in Python

import pvcobra

handle = pvcobra.create(access_key)
Enter fullscreen mode Exit fullscreen mode

When initialized, the valid sample rate is given by handle.sample_rate. The expected frame length is handle.frame_length. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

def get_next_audio_frame():
    pass
while True:
    voice_probability = handle.process(get_next_audio_frame())
Enter fullscreen mode Exit fullscreen mode

Congratulations!

Top comments (0)