Let's start with why you should use Picovoice Python SDK when there are alternative libraries and in-depth tutorials on speech recognition with Python.
- Private - processes voice data on the device
- Cross-platform β Linux, macOS, Windows, Raspberry Pi, β¦
- Real-time - zero latency
-I do not need to say accurate I guess. I haven't seen any vendor claiming mediocre accuracy π
Now, let's get started!
1 β Install Picovoice
pip3 install picovoice
2 β Create a Picovoice Instant
Picovoice SDK consists of Porcupine Wake Word, enabling custom hotwords and Rhino Speech-to-Intent, enabling custom voice commands. Jointly they enable hands-free experiences.
Porcupine, set an alarm for 1 hours and 13 seconds.
Porcupine detects the hotword "Porcupine", then Rhino captures the userβs intent and provides intent and intent details as seen below:
{
is_understood: true,
intent: setAlarm,
slots: {
hours: 1,
seconds: 13
}
}
To create a Picovoice instance we need Porcupine and Rhino models, paths to the models and callbacks for hotword detection and inference completion. For the simplicity, we'll use pre-trained Porcupine and Rhino models, however, you can train custom ones on the Picovoice Console: While exploring the Picovoice Console, grab your AccessKey
, too! Signing up for Picovoice Console is free, no credit card required.
from picovoice import Picovoice
keyword_path = ... # path to Porcupine wake word file (.PPN)
def wake_word_callback():
pass
context_path = ... # path to Rhino context file (.RHN)
def inference_callback(inference):
print(inference.is_understood)
if inference.is_understood:
print(inference.intent)
for k, v in inference.slots.items():
print(f"{k} : {v}")
pv = Picovoice(
access_key=${YOUR_ACCESS_KEY}
keyword_path=keyword_path(),
wake_word_callback=wake_word_callback,
context_path=context_path(),
inference_callback=inference_callback)
Do not forget to replace model path
and AccessKey
placeholders.
3 β Process Audio with Picovoice
Pass frames of audio to the engine:
pv.process(audio_frame)
4 β Read audio from the Microphone
Install [pvrecorder](https://pypi.org/project/pvrecorder/)
and read the audio:
from pvrecoder import PvRecoder
# `-1` is the default input audio device.
recorder = PvRecoder(device_index=-1)
recorder.start()
Read audio frames from the recorder and pass it to .process
method:
pcm = recorder.read()
pv.process(pcm)
5β Create a GUI with Tkinter
Tkinter is the standard GUI framework shipped with Python. Create a frame, add a label showing the remaining time to it, then launch:
window = tk.Tk()
time_label = tk.Label(window, text='00 : 00 : 00')
time_label.pack()
window.protocol('WM_DELETE_WINDOW', on_close)
window.mainloop()
Some resources:
Source code for the tutorial
Original Medium Article
Picovoice SDK
Picovoice Console
Top comments (0)