Voice controlled ToDo List: JavaScript Speech Recognition

#javascript #showdev #tutorial

Originally published on webdeasy.de!

With the JavaScript Speech Recoginition API you can implement quite simply ingenious functions that can quickly make native apps look old. This article will show you how to do this!

The JavaScript Speech Recoginition API allows us to access the visitor’s microphone and intercept and evaluate the speech inputs. And with it some cool things can be implemented: This can even go as far as your own AI! Or you build your own Amazon Echo (Alexa)? You have all possibilities. 🙂

Requirements

In order to use the Speech Recognition API, the browser must support JavaScript, which fortunately is now standard. Whereby there are actually people who block “the evil JavaScript”…and install extra add-ons on top of that. 🤯

In addition, the visitor must agree to the use of the microphone once. For this purpose, a pop-up will appear, which may look different depending on the operating system and browser. You can also allow the general use of the microphone on all websites in the browser settings.

How to use the Speech Recognition API

At the beginning, we define the interface that can be used by us. We have to do this, because not all browsers support this function. You can find the current status for browser support at Can I use.

window.SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

Now we create an instance of the SpeechRecognition class. We set the parameter interimResults to true, so that we can retrieve text input during input and not only after the API has recognized the end of the speech input. This way we can already perform evaluation even though the sentence has not even been finished.

We also specify the language using the lang parameter.

All events and parameters can also be read directly in the Web Speech API documentation.

// setup SpeechRecognation
const recognition = new SpeechRecognition();
recognition.interimResults = true;
recognition.lang = 'en-US';

Now everything is prepared and we can start to wait for voice inputs and evaluate them. The result event is triggered when the API has recognized a complete input, e.g. when the user has finished his sentence and is taking a break.

In the transcript variable we find the result. In line 6 the Boolean isFinal checks again whether the input was finished.

Optionally I added a query from line 10 on to check if an input starts with a certain word. The following demo is based on the same principle.

// waiting for speech results
recognition.addEventListener('result', event => {
  const transcript = event.results[0][0].transcript;

  // check if the voice input has ended
  if(event.results[0].isFinal) {
    console.log(transcript);

    // check if the input starts with 'hello'
    if(transcript.indexOf('hello') == 0) {
      console.log('You said hello to somebody.');
    }
  }
});

Finally, we start the speech input with the .start() function and call it when an input is finished. This way we achieve that the Speech Recognition API listens “permanently”.

recognition.addEventListener('end', recognition.start);
recognition.start();

You can change this so that listening is started e.g. when you click on a button – depending on what you want to do.

Example: Voice controlled ToDo List

I also tried a little bit with the Speech Recognition API and created a speech driven todo list with it. Using the same principle you can also build your own voice control. Try it yourself – you don’t need as much code as you might think at first!

Conclusion

I myself am a big fan of pure web applications and generally don’t need many native apps. The Speech Recognition API can make a big contribution to this. The implementation is – as you have seen – very simple. Which cool function do you want to implement with it? Please write it in the comments. 🙂

Top comments (6)

Bernard Baker • Sep 27 '20

I read your article and I was inspired by all the possibilities. I've updated a project to use voice recognition to aid the websites main navigation.

David sigampa • Sep 6 '22

Hello, I am building an mvc application to translate speech to text, my project continues trying to do different actions (build a video playlist) for each identified word.
Doing some tests with interimResults=true the console shows "well" "Wellco" "wellcome", which executes 3 actions to the server. Is there any way to identify when the speech engine already identified the complete word?

call function when identifying the whole word

Here is:
recognition.onresult = function (event) {
var final = "";
var interim = "";
for (var i = 0; i < event.results.length; ++i) {
if (event.results[i].isFinal) {
final += event.results[i][0].transcript;
textoFinal.innerHTML = final;
} else {
interim += event.results[i][0].transcript;
texto.innerHTML = interim;
}
}
}