DEV Community

loading...
Cover image for Speech Recognition and Voice Activity Detection for your Apps

Speech Recognition and Voice Activity Detection for your Apps

spurwing profile image Spurwing ・3 min read

Have you ever wanted to dive into the future and use advanced AI technologies? Today you can, thanks to our simple library! In this post we showcase our easy and simple Speech Recognition library for adding Voice Commands and Controls to any application.

Whether you are building web apps, native apps or desktop apps, this technology can be integrated into any system with an internet connection.

Full Demo

voice activity detection library

YouTube: https://www.youtube.com/watch?v=60llvnv3nDA

Source Code

GitHub: https://github.com/Spurwing/Speech-Recognition
This is our simple yet powerful server-client implementation with Speech Recognition in the browser. It works on any device that uses a modern up-to-date web browser (FireFox or Chrome recommended).

Architecture

The architecture is straightforward. The library contains the implementation of the Spurwing Socket Server, which is also connected to a Speech-to-Text provider. For the latter we use WitAI (by Facebook), it's a completely free service and easy to use. Alternatively you can easily integrate any other STT provider (Google, IBM Watson, Bing, ...) but these may come at a price.

speech to text architecture

Usage

  1. This is a NodeJS implementation, you need node (with npm) v12+. Check your version using node -v.
  2. Clone or download this respository.
  3. Run npm install to make it download all necessary dependencies. (If it fails you may need to install C++ Build Tools).
  4. We use WitAI as a free STT provider. You need to sign up and create an app here https://wit.ai/apps
  5. Under settings you'll find your "Server Access Token".
  6. Copy config.sample.json to config.json and edit it.
  7. You have to provide a value for WITAPIKEY which is your "Server Access Token".
  8. Use node index.js to launch the Socket Server.
  9. Visit http://localhost:8002/Spurwing/audio/ to start testing.
  • The Socket Server runs on port 8002 which you can change in index.js.

Client implementation

The code snippets below show how you can add this Speech Recognition library to your web apps.

Inside your html's <head> add the following:

<script src="https://cdnjs.cloudflare.com/ajax/libs/socket.io/4.0.1/socket.io.min.js"></script>
<script src="https://spurwing.github.io/Speech-Recognition/public/VAD.js"></script>     <!-- Required: VAD algorithm -->
<script src="https://spurwing.github.io/Speech-Recognition/public/audio.js"></script>   <!-- Required: Speech Recognition Library -->

<script src="demo.js"></script>    <!-- Your implementation -->
Enter fullscreen mode Exit fullscreen mode

Inside demo.js you have:

const spa = new SpurwingAudio();

// on user click start mic:
spa.init().then((stream) => { // ask user for microphone access
    processStream(stream);
}).catch((err) => {
    alert("You must allow your microphone.");
    console.log(error);
});
// on user click stop mic:
// spa.end();

function processStream(stream) { // start Voice Activity Detection
  spa.startVAD(
      () => console.log('recording'), // function: on speech start
      (buffer, duration) => {         // function: on speech end
        socket.emit('stream', {buffer, id:0}) // send audio/speech fragment to server (optional custom id of fragment)
      }
  );
}

// create socket connection to server
let socket = io('localhost:8002', { // server domain
  path: "/Spurwing/audio/socket.io" // server endpoint
});

// capture "text" event from server (containing data)
socket.on('text', data => {
    console.log(data) // do something with the transcribed audio text

    // data structure: { raw: "hello", nlp: null, id: 0 }
});
Enter fullscreen mode Exit fullscreen mode

Conclusion

With just a few lines of code, this opens up countless of possibilities and opportunities in Business Automation, Time Management and many other areas.

I'm eager to see which solutions you'll build with this, let us know in the comments below! :)

About us

Spurwing provides an enterprise grade Appointment Scheduling API and Calendar Management Solutions for your business and projects. Easy to customize and effortless to integrate. We provide software teams with Time Management Solutions thanks to our enterprise grade Appointment Scheduling API. In addition we are building a completely free and open source marketplace containing widgets, chat bots, dashboards and integration solutions.

For more projects make sure to follow our blog and on GitHub https://github.com/Spurwing/

Discussion (0)

pic
Editor guide