Hi there๐,
In this guide we will be learning how to integrate voice user interface in our web application.
We are working with React
. To incorporate Voice User Interface (VUI
) we will use Web Speech API
.
For simplicity we will not be focusing on design.
Our aim is to build a voice assistant which will recognize what we say and answer accordingly.
For this we are using Web Speech API.
This API allows fine control and flexibility over the speech recognition capabilities in Chrome version 25 and later.
The Web Speech API
provides us with two functionality โ
-
Speech Recognition which converts
speech to text
. -
Speech Synthesis which converts
text to speech
.
1. We will start by installing two npm packages:
// for speech recognition
npm i react-speech-recognition
// for speech synthesis
npm i react-speech-kit
Now before moving on to the next step, let's take a look at some important functions of Speech Recognition
.
Detecting browser support for Web Speech API
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
//Render some fallback function content
}
Turning the microphone on
SpeechRecognition.startListening();
Turning the microphone off
// It will first finish processing any speech in progress and
// then stop.
SpeechRecognition.stopListening();
// It will cancel the processing of any speech in progress.
SpeechRecognition.abortListening();
Consuming the microphone transcript
// To make the microphone transcript available in our component.
const { transcript } = useSpeechRecognition();
Resetting the microphone transcript
const { resetTranscript } = useSpeechRecognition();
Now we're ready to add Speech Recognition (text to speech
) in our web app ๐
2. In the App.js
file, we will check the support for react-speech-recognition
and add two components StartButton and Output.
The App.js
file should look like this for now:
import React from "react";
import StartButton from "./StartButton";
import Output from "./Output";
import SpeechRecognition from "react-speech-recognition";
function App() {
// Checking the support
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return (
<div>
Browser does not support Web Speech API (Speech Recognition).
Please download latest Chrome.
</div>
);
}
return (
<div className="App">
<StartButton />
<Output />
</div>
);
}
export default App;
3. Next we will move to the StartButton.js
file.
Here we will add a toggle button to start and stop listening.
import React, { useState } from "react";
function StartButton() {
const [listen, setListen] = useState(false);
const clickHandler = () => {
if (listen === false) {
SpeechRecognition.startListening({ continuous: true });
setListen(true);
// The default value for continuous is false, meaning that
// when the user stops talking, speech recognition will end.
} else {
SpeechRecognition.abortListening();
setListen(false);
}
};
return (
<div>
<button onClick={clickHandler}>
<span>{listen ? "Stop Listening" : "Start Listening"}
</span>
</button>
</div>
);
}
export default StartButton;
4. Now in the Output.js
file, we will use useSpeechRecognition
react hook.
useSpeechRecognition
gives a component access to a transcript of speech picked up from the user's microphone.
import React, { useState } from "react";
import { useSpeechRecognition } from "react-speech-recognition";
function Output() {
const [outputMessage, setOutputMessage] = useState("");
const commands = [
// here we will write various different commands and
// callback functions for their responses.
];
const { transcript, resetTranscript } =
useSpeechRecognition({ commands });
return (
<div>
<p>{transcript}</p>
<p>{outputMessage}</p>
</div>
);
}
export default Output;
5. Before defining the commands, we will add Speech Synthesis
in our web app to convert the outputMessage to speech.
In the App.js
file, we will now check the support for the speech synthesis
.
import { useSpeechSynthesis } from "react-speech-kit";
funtion App() {
const { supported } = useSpeechSynthesis();
if (supported == false) {
return <div>
Browser does not support Web Speech API (Speech Synthesis).
Please download latest Chrome.
</div>
}
.
.
.
export default App;
6. Now in the Output.js
file, we will use useSpeechSynthesis()
react hook.
But before moving on, we first take a look at some important functions of Speech Synthesis
:
- speak(): Call to make the browser read some text.
- cancel(): Call to make SpeechSynthesis stop reading.
We want to call the speak()
function each time the outputMessage is changed.
So we would add the following lines of code in Output.js
file:
import React, { useEffect, useState } from "react";
import { useSpeechSynthesis } from "react-speech-kit";
function Output() {
const [outputMessage, setOutputMessage] = useState("");
const { speak, cancel } = useSpeechSynthesis();
// The speak() will get called each time outputMessage is changed
useEffect(() => {
speak({
text: outputMessage,
});
}, [outputMessage]);
.
.
.
export default Output;
}
๐Whoa!
Everything is now setup ๐ฅ
The only thing left is to define our commands ๐ฉ๐ค
7. Now we're back at our Output.js
file to complete our commands.
const commands = [
{
// In this, the words that match the splat(*) will be passed
// into the callback,
command: "I am *",
callback: (name) => {
resetTranscript();
setOutputMessage(`Hi ${name}. Nice name`);
},
},
// DATE AND TIME
{
command: "What time is it",
callback: () => {
resetTranscript();
setOutputMessage(new Date().toLocaleTimeString());
},
matchInterim: true,
// The default value for matchInterim is false, meaning that
// the only results returned by the recognizer are final and
// will not change.
},
{
// This example would match both:
// 'What is the date' and 'What is the date today'
command: 'What is the date (today)',
callback: () => {
resetTranscript();
setOutputMessage(new Date().toLocaleDateString());
},
},
// GOOGLING (search)
{
command: "Search * on google",
callback: (gitem) => {
resetTranscript();
// function to google the query(gitem)
function toGoogle() {
window.open(`http://google.com/search?q=${gitem}`, "_blank");
}
toGoogle();
setOutputMessage(`Okay. Googling ${gitem}`);
},
},
// CALCULATIONS
{
command: "Add * and *",
callback: (numa, numb) => {
resetTranscript();
const num1 = parseInt(numa, 10);
const num2 = parseInt(numb, 10);
setOutputMessage(`The answer is: ${num1 + num2}`);
},
},
// CLEAR or STOP.
{
command: "clear",
callback: () => {
resetTranscript();
cancel();
},
isFuzzyMatch: true,
fuzzyMatchingThreshold: 0.2,
// isFuzzyMatch is false by default.
// It determines whether the comparison between speech and
// command is based on similarity rather than an exact match.
// fuzzyMatchingThreshold (default is 0.8) takes values between
// 0 (will match anything) and 1 (needs an exact match).
// If the similarity of speech to command is higher than this
// value, the callback will be invoked.
},
]
๐We have successfully built a voice assistant
using the Web Speech API
that do as we say ๐ฅ๐ฅ
Note: As of May 2021, browsers support for
Web Speech API
:
- Chrome (desktop)
- Chrome (Android)
- Safari 14.1
- Microsoft Edge
- Android webview
- Samsung Internet
For all other browsers, you can integrate a
polyfill
.
Top comments (6)
Hi hi,
I love this writeup. I'm currently working on something like this and will appreciate if I could have access to the source code so I could learn more and better.
Here is my email address just in case: oyeladedavid1@gmail.com
Thank you so much. Your comment makes me wanna continue writing blogs.
Here's the link to the repo: voice-assistant
The main files you wanna check are: StartButton.js which starts or stops the speechRecognition
And Output.js which has all the functions in it.
Thank you so much. I deeply appreciate this. Just in case I have further queries, how can I reach out to you?
You could contact me on any of these or just comment here. I surely try to help if i could.
Here are my accounts:
LinkedIn: roopalisingh-rs/
Mail: roopali.singh.222@gmail.com
Hello everyone,
an interesting article. Thank you!
But what is the reason that the site does not work on an Android smartphone with Chrome browser? Microphone permission was granted...
Best regards.
Hey Max,
I've just checked it for Chrome Android and it does work for me. Although the chrome keep sending me the notification that the site is using my microphone every second even after giving the permission. So I first need to turn off the microphone notifications for chrome.
I'm not following this project from a while now so it could be due to some new changes in the Web Speech API that I have used.
Are you also facing the same issue ?
Maybe you could check the docs at this link and find the issue -
Mdn docs