DEV Community

loading...

Voice to Text Chatbot.

Kim Nguyen
She/her. Inspire to Aspire :).
・2 min read

This blog is the second part of a two-part chatbot tutorial series. Check out the first part here.


In this blog, I'm going to walk you through how to implement a voice-to-text and vice versa feature for our chatbot :). The API I'm using to implement this chatbot skill is Web Speech API created by Mozilla Firefox, besides this, there's also Google Cloud Speech-to-Text API but I'm not going to dive too deep into that today! Okay, let's dive into it.

Let's add a microphone icon (you can choose any icon you want) in our chatbot input to notify the user about our newly added feature:

<InputGroup.Append>                          
    <img 
        src='https://img.icons8.com/dusk/64/000000/microphone.png'
        alt='microphone-icon'
        variant='info' 
        type="submit" 
        className="mb-2 voice-chat-btn" 
        onClick={() => handleVoice(recognition)}
    />
</InputGroup.Append>
Enter fullscreen mode Exit fullscreen mode

This is our current ChatBot:
chatbot with microphone

This button will listen to a click event, and you probably spot, there is a function handleVoice() that got executed whenever the user clicks on the microphone. The idea is, when the user clicks on that button, our bot will automatically knows to listen for the human voice and translate it from voice to text. First, let's initialize our speech recognition using Web Speech API:

const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition
const recognition = new SpeechRecognition();
recognition.lang = 'en-US';
Enter fullscreen mode Exit fullscreen mode

On the official doc, SpeechRecognition definition:


"The SpeechRecognition interface of the Web Speech API is the controller interface for the recognition service; this also handles the SpeechRecognitionEvent sent from the recognition service."


This is the core of our speech-to-text translation. Beside that, there are multiple methods (start(), stop(), abort()) and properties (lang, grammars, continuous, etc) that we can add. For this chatbot, I'm only using start(), onresult() methods, and lang property to set English as the language for my current chatbot. Let's implement our handleVoice() function that will translate our voice-to-text:

const handleVoice = (recognition) => {
    recognition.start()

    recognition.onresult = function (event) {
        const resultIndx = event.resultIndex
        const transcript = event.results[resultIndx][0].transcript
        setUserHistory([transcript, ...userHistory])
        matchReply(transcript)
    }
}
Enter fullscreen mode Exit fullscreen mode

In this function, we will execute:

  • recognition.start(): starts the speech recognition to listen for audio.
  • recognition.onresult(): an event handler that sends the translated words or phrase back to our application.
  • setUserHistory(): save transcript to our state management.
  • matchReply(): generate a corresponding bot reply for our transcript.

Now, our bot should be able to recognize and understand our speech. But it's not talking back to us yet! Let's add this functionality so that our bot can have a full conversation with us:

const speak = (string) => {
    const u = new SpeechSynthesisUtterance();
    const allVoices = speechSynthesis.getVoices();
    u.voice = allVoices.filter(voice => voice.name === "Alex")[0];
    u.text = string;
    u.lang = "en-US";
    u.volume = 1;
    u.rate = 1;
    u.pitch = 1;
    speechSynthesis.speak(u);
}
Enter fullscreen mode Exit fullscreen mode

And in our matchReply(), let's execute our newly added speak() function:

const matchReply = (userInput) => {
    ...

    setBotHistory([botMsg, ...botHistory])
    speak(botMsg)
}
Enter fullscreen mode Exit fullscreen mode

Discussion (0)