Voice User Interface in tiny react project

#speechrecognition #beginners #react #learning

Sometimes during exercise, I want to track how long I’m able to hold the plank position or a certain stretch. However this situation makes it awkward to use my phone and this inspired me to explore Web Speech API and make voice controlled stopwatch in react. In this article, I’m focusing on adding voice commands to the existing react tiny project. You can follow along the codebase on github

Speech Recognition in browsers

The SpeechRecognition interface is part of the experimental Web Speech API and it makes it easy to implement voice user interface (VUI) in browsers. Official browser support looks great, however in practice, it is less optimistic. According to my tests with multiple browsers, it has been working reliably only in Chrome and Safari.

Tiny project with voice user interface

The only thing I want to render is the Timer component inside the App where I will add controls using voice commands. This is what the initial structure looks like:



export function App() {
  const lastCommand = useVoiceCommands()
  return (
    <Timer run={true} />
  );
}

// Placeholder for adding voice commands
function useVoiceCommands() {
  return "";
}

Note the placeholder hook useVoiceCommands which will be used to add - you guessed it, voice command functionality.

Lucky for me there is already a package available for speech recognition in react called react-speech-recognition.

This package offers an option to add 3rd party recognition service as a polyfill for full cross-browser support. For this project, I'm happy with Chrome and Safari support so I will not investigate this option.

Custom `useVoiceCommands` hook

Let's start building our useVoiceCommands hook by importing the building blocks from react-speech-recognition.



import SpeechRecognition, { useSpeechRecognition } from "react-speech-recognition";

react-speech-recognition package exports useSpeechRecognition hook to read recognized speech in our component, and SpeechRecognition object to be used to start and stop the recognition service. I want to start the speech recognition when app renders so I will call it inside React.useEffect hook with an empty dependency array, including abortListening method in the clean up function.



function useVoiceCommands() {
    React.useEffect(() => {
        // Timeout fixes the error during live-refresh in development when listening
            // attempted to start before it has been aborted in a cleanup
        setTimeout(
          () => SpeechRecognition.startListening({ continuous: true }),
          500,
        );

            // Stop listening when component is unmounted.
        return () => {
          SpeechRecognition.abortListening();
        };
      }, []);
    return "";
}

Let’s define the commands for the stopwatch:

start - start the timer

pause - pause at the current time

stop / reset - stop and set the stopwatch back to zeroes.

These words will have to be recognized and used in the App to interact with the timer. At the moment our app starts listening to microphone input but we don’t read the output yet so let’s do this next. Add the useSpeechRecognition hook imported from react-speech-recognition package into the custom useVoiceCommands hook. Its output is an object containing properties indicating browser support and transcribe details.

If you want to learn more, I recommend consulting the API docs

In this use case, we will want to trigger state change when specific words are recognized so we can ignore the transcript properties and disable state update in useSpeechRecognition hook by adding a configuration object into its call with transcribing set to false:

useSpeechRecognition({ transcribing: false })

💡 Everything indicates browser support speech recognition in all major browsers (Firefox behind flag) but at the time of writing, it is operating correctly only in Chrome and Safari

React speech recognition has an excellent feature for exactly what we need. It is called “commands” and we can use it to define specific words we want to listen for, with fuzzy match and callback to be triggered upon recognition. In the command array, we put all the commands we want to be recognized, and as a callback, we will set the state value of the lastCommand. This is what the final version of the custom useVoiceCommands hook looks like:



function useVoiceCommands() {
  const [lastCommand, setLastCommand] = React.useState("");

  useSpeechRecognition({
    transcribing: false,
    commands: [
      {
        command: [
          "start",
          "stop",
          "reset",
          "pause",
          "pose" /* pose - catches misunderstood pause */,
        ],
        callback: setLastCommand,
        matchInterim: true,
        isFuzzyMatch: true,
      },
    ],
  });

  React.useEffect(() => {
    // Timeout fixes error during development when listenning attempted to start
    // before it has been aborted in a cleanup
    setTimeout(
      () => SpeechRecognition.startListening({ continuous: true }),
      500,
    );

    return () => {
      SpeechRecognition.abortListening();
    };
  }, []);

  return lastCommand;
}

Using the voice commands in app

By using the custom hook for getting the voice command we are keeping the body of the App function component concise and clean. This component will make use of lastCommand in the following ways.

The lastCommand state variable will be used to conditionally set the run prop of timer. If its value is start - the run will be true which starts the timer. When we say pause, run prop will change to false and timer will pause displaying the time that has passed since start.
Same thing actually happens when you say "stop" or "reset", but for these commands we will set the time back to zeros.

In order to reset the Timer back to zero, we will force the component to re-initialize by adding the key prop and change its value when either “stop” or “reset” commands are recognized.



export function App() {
  const [resetKey, setResetKey] = React.useState(0);
  const lastCommand = useVoiceCommands();

  React.useEffect(() => {
    if (["reset", "stop"].includes(lastCommand)) {
      setResetKey((x) => x + 1);
    }
  }, [lastCommand]);

  return (
    <Timer key={resetKey} run={lastCommand === "start"} />
  );
}

Changing the value of the key prop will force react to unmount the existing instance and mount new one which coincides with our goal to reset the timer.

Now the app utilizes all of the commands we set to use for this tiny project. Remember when you test this, that indicated browser support for SpeechRecognition API is not reliable and it is best to run this project in Chrome. At the moment react-speech-recognition does not reveal the error state (I have submitted PR for this).

To help you check the support for your browser I have added an error listener on the demo of this project along with the display of debugging information. After you enable microphone permissions and start talking it will either display recognized commands or the error code on the screen, so you don’t end up talking to your computer for no reason 🙂

Hopefully, this tiny project walkthrough gave you an idea of how to start adding a voice user interface to your projects. Let me know in the comments all your questions or your feedback on this walkthrough or what you are going to use the speech recognition for. If you see something that could be done better feel free to submit a PR and I will be happy to post an update.