On-device speech recognition for the web? Even for transcription? Yes and Yes!
Leopard Speech-to-Text processes voice data on your browser, without sending voice data to a 3rd party cloud.
1. Setup & Installation
Create a project and install the SDK:
npm install @picovoice/leopard-web
Get your AccessKey from the Picovoice Console - It's free
2. Serving the Model
Leopard is an on-device speech-to-text solution. So we need to transfer the model (deep neural network) to the client to enable voice processing within the browser.
2 options:
2.1 Serve the model on the Public Directory and pass the URL to SDK.
Pro: Reduces the page size
Con: Requires upfront work.
2.2. Ship the model with the page content by transforming it into a text form using Base64 Encoding.
Pro: Very straightforward. There is even a utility in the Leopard package to convert the model into base64 format:
npx pvbase64 -i ${MODEL_PATH} -o ${BASE64_PATH}
Con: Yes, the page size!
3. Implement Speech Recognition in JavaScript
Create an instance of Leopard:
const handle = await Leopard.create(
accessKey,
leopardModel
);
Paste your AccessKey from Picovoice Console (i.e. replace accessKey above). leopardModel is an object containing information about the whereabouts of the model. If you are using the public directory method, use this:
const leopardModel = {
publicPath: publicRelativePath,
}
If you are using the base64 method, use this:
const leopardModel = {
base64: base64String,
}
Transcribe audio:
function getAudioData(): Int16Array {
... // function to get audio data
}
const result = await handle.process(getAudioData());
console.log(result.transcript);
console.log(result.words);
Implement getAudioData based on your application. It can read from a microphone via Web Audio API or possibly a file.
This post was originally published on Picovoice blog!
Top comments (0)