DEV Community

Dilek Karasoy for Picovoice

Posted on • Edited on

Day 5: Building a local audio transcription engine running on your web browser with JavaScript

On-device speech recognition for the web? Even for transcription? Yes and Yes!

Leopard Speech-to-Text processes voice data on your browser, without sending voice data to a 3rd party cloud.

1. Setup & Installation
Create a project and install the SDK:

npm install @picovoice/leopard-web
Enter fullscreen mode Exit fullscreen mode

Get your AccessKey from the Picovoice Console - It's free

2. Serving the Model
Leopard is an on-device speech-to-text solution. So we need to transfer the model (deep neural network) to the client to enable voice processing within the browser.

2 options:

2.1 Serve the model on the Public Directory and pass the URL to SDK.
Pro: Reduces the page size
Con: Requires upfront work.
2.2. Ship the model with the page content by transforming it into a text form using Base64 Encoding.
Pro: Very straightforward. There is even a utility in the Leopard package to convert the model into base64 format:

npx pvbase64 -i ${MODEL_PATH} -o ${BASE64_PATH}
Enter fullscreen mode Exit fullscreen mode

Con: Yes, the page size!

3. Implement Speech Recognition in JavaScript
Create an instance of Leopard:

const handle = await Leopard.create(
  accessKey,
  leopardModel
);
Enter fullscreen mode Exit fullscreen mode

Paste your AccessKey from Picovoice Console (i.e. replace accessKey above). leopardModel is an object containing information about the whereabouts of the model. If you are using the public directory method, use this:

const leopardModel = {
  publicPath: publicRelativePath,
}
Enter fullscreen mode Exit fullscreen mode

If you are using the base64 method, use this:

const leopardModel = {
  base64: base64String,
}
Enter fullscreen mode Exit fullscreen mode

Transcribe audio:

function getAudioData(): Int16Array {
  ... // function to get audio data
}
const result = await handle.process(getAudioData());
console.log(result.transcript);
console.log(result.words);
Enter fullscreen mode Exit fullscreen mode

Implement getAudioData based on your application. It can read from a microphone via Web Audio API or possibly a file.

This post was originally published on Picovoice blog!

Top comments (0)