A chrome extension with a hand gesture and speech recognition capabilities.

#javascript #react #chromeextension #extension

Here I will be discussing a chrome extension Hand in the Air which I've developed with the help of open-source projects and my little knowledge on browser extensions which manages user scripts and invokes these scripts based upon user interaction (hand gesture and voice input).

I wanted to make a chrome extension which could manage the user scripts(scripts which run on the particular domain) like Greasemonkey but with some user interaction like waving a hand in front of the webcam (hand gesture recognition) or through voice interaction(Speech recognition) so basically a Greasemonkey extension on Steroids.

Before taking on this project I wanted to use reactjs lib and import/export in chrome extension. as create-react-app was not useful in this.
Along the way, I've figured out that I could do it with the help of a transpiler or a next-gen compiler (in this case it was babel) and a bundler which was webpack.

After creating chrome-extension-react-boilerplate setup, I came along this library Gest.js which uses diff algorithm to recognise hand movement(left, right, up, down). I modified it to make it importable in the project as initially it was written in es5. After this main issue was how should I include this library so it asks for camera permission only for the first time. and could operate on any browser tab that user visits. so for this, I put this script on the options page and the background script of the extension. so for first-time browser extension asks camera/audio permissions from options page, then the browser is whitelisted for camera/audio uses then I could easily operate on camera/audio input when the user clicks on the extension icon. on second run extension could get access camera/audio from background script. and recognized gesture direction is sent to the active tab with the help of message passing APIs. After this, I created an options page UI form to save user scripts on a particular domain.

there are two types of user-scripts for hand gesture -

1. default user scripts
2. custom scripts

default scripts are the scripts that are already available in the extension eg. mapping arrow keys with left, right, up, down hand gestures on any webpage(useful in playing games on http://play2048.co).

In custom scripts, the user can type their logic as gesture object was exposed in the custom script APIs.

eg. On tinder web(tinder.com) user can wave hand left to right or vise versa to like or dislike a profile

if (!gesture.error) {
  var el = null;
  if (gesture.direction === "Left") {
    el = document.querySelector('[aria-label="Nope"]');
    el.click();
  } else if (gesture.direction === "Right") {
    el = document.querySelector('[aria-label="Like"]');
    el.click();
  }
}

user can go back and forth in slides on https://www.slideshare.net/

if (!gesture.error) {
  if (gesture.direction === "Left") {
    document.querySelector("#btnNext").click();
  } else if (gesture.direction === "Right") {
    document.querySelector("#btnPrevious").click();
  }
}

`
I defined these terms:-

script handler: a handler that recognizes the gestures eg. gestjs
- script handler callback: a callback which is fired by script handler.

So I thought of replacing gestjs with a generic handler so I thought of integrating voice input feature chrome speech recognition APIs .for I used annyang.js which is built on top of chrome speech recognition API and has a couple of utility classes on top of that. basically it turned out as alexa skills for webpages.

I created 2 same types for voice input -

1. default user scripts
2. custom scripts

default user scripts are the same as described above.
In Custom scripts I exposed a string variable command which contains the recognized word or sentence.

eg. to navigate back and forth in slides on https://www.slideshare.net/ use can say next or previous

if (command === "next") {
    document.querySelector("#btnNext").click();
  } else if (command === "previous") {
    document.querySelector("#btnPrevious").click();
  }

Further Development

I want to add one more handler for eye-movement tracking, cause there is a use case for this lets users say click on search while being on https://google.com. this can be written in the above-defined script but for now, the script has to search on the entire page for visible text or something to match the element and click on that and this brute force search could lead to false results very easily.
So to have eye-tracking I came along this open-source project called webgazer it wasn't ready to use the library so I made a pr there and making couple of more tweaks to make it usable in chrome extension. so that searching could be limited to bounding box of 100px relative to where user is currently looking at the webpage.

thanks for the read and Stay Healthy!

DEV Community

A chrome extension with a hand gesture and speech recognition capabilities.

Further Development

Top comments (0)