Picovoice has just realized the new version of its Voice Activity Detection engine, Cobra. It's more accurate than its predecessor, making it much more accurate than webRTC VAD. That's why on day 24 we'll discuss how to measure VAD performance.
Receiver operator characteristics (ROC) curve is a known tool for inspecting the performance of binary classifiers across different decision thresholds.
Voice Activity Detection is a binary classifier. It detects presence or absence of human speech in audio streams. Hence ROC curve is used to measure its accuracy.
The false positive rate is measured as the number of false positive frames detected over the total number of non-voice frames. Likewise, the true positive rate is measured as the number of true positive frames detected over the total number of voice frames. A larger area under the ROC curve is better.
Picovoice developed an open source benchmark with LibriSpeech as it provides a diverse number of speakers and is gender-balanced. Then adds noise data from the DEMAND dataset. It has noise from 18 different environments (e.g. kitchen, office, traffic, etc.)
So if you want to draw a ROC curve with LibriSpeech for webRTC VAD, feel free to use the open-source benchmark. The steps are below:
1. Clone the repository:
git clone https://github.com/Picovoice/voice-activity-benchmark
- Install the dependencies:
pip3 install -r requirements.txt
- Run the benchmark:
python3 benchmark.py \
--librispeech_dataset_path ${LIBRISPEECH_DATASET_PATH} \
--demand_dataset_path ${DEMAND_DATASET_PATH} \
--engine WebRTC
Accuracy is not the only consideration to evaluate the VAD performance. Runtime requirements are important, too. However, it depends on the platform you choose. For example, on a Raspberry Pi Zero, Cobra measured a realtime factor (RTF) of 0.05, or about 5% CPU usage whereas on a laptop it can be 0.0006, 100 times smaller.
Top comments (0)