DEV Community

loading...
Cover image for Use vocal commands in your website (native Chrome API)

Use vocal commands in your website (native Chrome API)

9zemian5 profile image Oleksandr Demian ・3 min read

This morning I was wondering: what is the cool feature I can add to my website? The first think that came in my mind was: "navigate the website using my voice". The first thing to do is, obviously, research. Came out it is much easier than I could imagine. I implemented it in my website (made with Svelte) in 30 minutes, enabling the vocal navigation between sections. However, here I will explain some basics with pure javascript and html (no bundler, no framework, only javascript).

Expected result

Simple webpage with a button that will enable speech recognition, text will be displayed above the button (in order to simplify the task, it will record only one speech per time).

Setup

Following is required:

  • Chrome
  • index.html and main.js files

index.html

Nothing interesting here: a simple webpage with text(speech display) and button (trigger speech recognition).

<html>
    <head>
        <title>Vocal commands</title>
        <style>
            body {
                padding: 0;
                margin: 0;
                display: flex;
                flex-direction: column;
                justify-content: center;
                align-items: center;

                height: 100vh;
            }

            button {
                background-color: white;
                border: 1px solid black;
                padding: 10px;
                cursor: pointer;
                border-radius: 15px;
            }

            button:disabled{
                background-color: red;
                color: white;
                border: 1px solid white;
            }
        </style>
    </head>
    <body>
        <h1 id="text">Text</h1>
        <button id="start">Start</button>
    </body>

    <script src="./main.js"></script>
</html>

main.js

Here is where the magic is done.

const speechButton = document.getElementById("start");
const text = document.getElementById("text");

//webkitSpeechRecognition instance
let recognition = null;

//display text to user
const displayText = (txt) => {
    text.innerText = txt;
};

const setup = () => {
    //create new instance of webkitSpeechRecognition
    recognition = new webkitSpeechRecognition();

    //continuous = false: stop after recognition is done (with true it will continue to recognize text until manual shutdown)
    recognition.continuous = false;
    //language setup (I didnt tried other languages)
    recognition.lang = 'en-US';

    //result event is triggered each time some text was recognized
    recognition.addEventListener("result", (event) => {
        //get the recognized text
        const word = event.results[0][0].transcript;

        //trim as it can have white spaces
        let fWord = word.trim();

        console.log('Result received: ' + fWord);
        //display the result of text recognition
        displayText("You sad: " + fWord);
    });

    //end event is triggered after recognition is stopped
    recognition.addEventListener("end", () => {
        speechButton.disabled = false;
    });

    //a bit of error handling
    recognition.addEventListener("error", () => {
        speechButton.disabled = false;
        displayText("Error occurred");
    });
};

//start speech recognition (disabling button, 'cause of users)
const start = () => {
    recognition.start();
    speechButton.disabled = true;
    displayText("Say something...");
};

//stop recognition (although there is no need for it in this case)
const stop = () => {
    recognition.stop();
};

//check if webkitSpeechRecognition exists (otherwise it will crash in some browsers)
const canListen = () => {
    return webkitSpeechRecognition != null;
};


if(canListen()){
    //setup speech recognition
    setup();

    //speech recognition will start after button click
    speechButton.addEventListener("click", () => {
        start();
    });
} else {
    //notify user that he cannot do speech recognition
    speechButton.addEventListener("click", () => {
        alert("webkitSpeechRecognition module is not defined");
    });
}

Conclusion

The code above is pretty simple, you can easily integrate it within any modern framework. The only drawback is the speed and precision (it takes a bit too long to recognize text, and it fails a lot).

Useful links

Discussion

pic
Editor guide