Introduction
Speech recognition refers to the the process of enabling a computer to identify and respond to the sounds produced in human speech.
It was first introduced at Bell Laboratories in 1952 and this version could only recognize numbers but not words. Few years later, speech recognition had grown from just recognizing numbers to recognizing text, grammers and even detecting noise.
This technology was developed as an alternative to typing on keyboard, you will only have to talk to your computer and your words appear on your computer screen
Web Speech Api
In year 2012, the Web Speech Api was introduced with the aim of enabling speech recognition and also converting text to speech on modern web browsers.
Note: Speech recognition is not currently supported on all browsers click here for list of compatible browsers.
Getting Started
The first thing we need to do is to check if our browser is compatible with speech recognition, we can easily achieve that with the code below 👇
The next step is to create a new speech recognition object and check for when recording start.
Finally to start our speech recognition and do something with the output.
Code Explanation:
-
recognition.onstart
: This is an event handler that will run when the speech recognition service has begun listening to incoming audio. -
recognition.onresult
: Another event handler that will run when the speech recognition service returns a result. recognition.start()
: This method will start the speech recognition service and start listening to incoming audio, running this code for the first time will show a dialog asking for access to your device microphone like below.
transcript
: This is the text output generated after the speech recognition service had stopped, and that's all we need from all the code we've written so far. For now, we are just logging the output to the console, you can choose to do something else with it.
There are more properties, methods and event handlers that can be used when accessing the speech recognition objects, some of which include:
-
recognition.grammars
: Used to set the grammars that will be understood by the speech recognition service. -
recognition.continuous
: Boolean to set whether continuous results are returned for each recognition, or only a single result.
Click here for full list of supported methods, properties and event handlers.
Sayit 🗣
I'd recently built a progressive web app (utilizing speech recognition)
that convert spoken word to text and provide a button to instantly share this text across various social media platform.
This project could be handy when you want to send a lengthy email or post on social media.
View the project live here and if you think its cool, kindly give a star on github (contributions are also welcome 🤗).
Conclusion
+1 for Accessibility
Speech recognition had played a great role in accessibility over the past few years, most especially for the visually impaired, people with injured arm and many more. Since they cannot use the keyboard for typing, they'd to default to using their voice for controlling and navigating through applications and web pages.
Project Idea
If you are so into speech recognition (like i am), how about building a web pages that is fully automated and controlled with voice rather than clicking or swiping. For example from the index page, i could just say go to about page, and i will be redirected to about page, sounds cool? yeah!. I will love to see what you've built, you can send me a message on twitter, i will gladly answer your questions.
P.s: i'm looking to make new dev friends 🤗, lets connect on twitter.
Thanks for reading 👏
Top comments (11)
Nice! Very neat and simple.
Should include an practical example of where you show the results given through a test.
Thanks Patrick, I'd just added a sub section explaining the result.
And here i showed a practical example implementing speech recognition.
This is a great post! Can you show me how to use this in react js functional component. I don't really have an idea how to capture my text area's value inside a js variable.
Below is the onclick() function that I'm using to invoke record() function inside a button in reactjs.
Speech to text
const record = () => {
window.recognition.onresult = function(event) {
console.log(event);
let output = document.getElementById("output");
output.innerHTML = "";
}
window.recognition.start();
}
This is nice sir
Thank you sire ☺️
can it generate sounds, of different length ways?
Not really Mihai, this is basically to recognize spoken words and not generate. If you are looking to convert text to speech, here is a great tutorial for that.
Please visit covid.tusharsahay.com to see a practical example where I have used this API. Let me know how you find it! :-)
Wow, this is amazing Tushar.
Really nice!
Keep up the good work 🎉
Glad you found it super useful ❤️
Does anyone know how to use the API in Firefox browser, the onresult event function is not executing for some reason