How to detect human faces (and other shapes) in JavaScript

#javascript #tutorial #learning

Google believes in a Web that can compete with native applications unintimidated. One of the areas in which native applications for years have been superior to web applications was detecting shapes in images. Tasks such as face recognition were not possible until recently… But not anymore!

Shape Detection API

A new standard proposal has recently been announced in the Web Platform Incubator Community Group (WICG): Shape Detection API. It allows detecting two types of shapes in an image:

Currently, both of these detectors are implemented inside Chrome. Barcode detection is enabled by default and face detection is behind a flag (chrome://flags#enable-experimental-web-platform-features). There is also one more specification defining Text Detection API that allows detecting text.

All of these detectors share the same API:

const detector = new FaceDetector( optionalOptions );
const results = await detector.detect( imageBitmap );

There are three interfaces available globally (both inside the page and inside the Web Worker thread):

FaceDetector,
BarcodeDetector,
TextDetector.

The optionalOptions parameter is an object containing additional configuration for the detector. Every shape detector has its own set of options, but you can also omit this parameter altogether — in most cases, the defaults are usually enough.

After constructing a detector, you can use its asynchronous detect() method to actually detect shapes in the image. The method returns an object with the coordinates of the shape in the image and additional information about it (for example, recognized text in the TextDetector API or coordinates of particular face parts, like eyes or nose, in the FaceDetector API).

The imageBitmap parameter is the image to analyze, passed as an ImageBitmap instance.

Side note: Why is this ImageBitmap instead of just an img element or simply a Blob? This is because the shape detectors are available also inside workers, where there is no access to the DOM. Using ImageBitmap objects resolves this issue. Additionally, they allow using more image sources, like canvas elements (including offscreen ones) or even video.

And that's basically it!

Sample application

Ok, let's see how the new knowledge can be applied in practice. Let's prepare a sample web application that will allow you to detect shapes using the proposed API!

Shape detection API screencast — Detecting faces and text on an uploaded photo with the Shape Detection API.

HTML

Start with the index.html file:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Shape Detection API demo</title>
</head>
<body>
    <h1>Shape Detection API</h1>

    <h2>Face detection</h2>
    <label>Choose an image file:
        <input type="file" accept="image/*" data-type="face">
    </label>

    <h2>Barcode detection</h2>
    <label>Choose an image file:
        <input type="file" accept="image/*" data-type="barcode">
    </label>

    <h2>Text detection</h2>
    <label>Choose an image file:
        <input type="file" accept="image/*" data-type="text">
    </label>

    <script type="module">
    </script>
</body>
</html>

The file contains three input[type=file] elements that will be the sources of images to analyze. All of them have a [data-type] attribute that informs the script which shape you want to retrieve. There is also a script[type=module] element that will contain the code needed to handle the input elements:

import detectShape from './detector.mjs'; // 1

document.body.addEventListener( 'change', async ( { target } ) => { // 2
    const [ image ] = target.files; // 3

    const detected = await detectShape( image, target.dataset.type ); // 4

    console.log( detected ); // 5
} );

First, you import the detectShape() function from detector.mjs (1). This function will do the entire job.

Then you bind the change event listener to document.body (2). It will react to all changes in input elements thanks to the event delegation mechanism.

Additionally, the listener is asynchronous, as the detector is also asynchronous and I like to use the async/await syntax whenever I can.

There is also a destructuring statement to get only the target property of the event object passed to the listener — so only the element which fired the event.

Fortunately, the next line is not as crowded and it basically gets the file chosen by the user and saves it to the image variable (3).

When you get the image, you can just pass it to the detectShape() function alongside the type of the detector, fetched from the [data-type] attribute (4).

After awaiting results, you can log them into the console (5).

JavaScript

Let's move to the detector.mjs file:

const options = { // 5
    face: {
        fastMode: true,
        maxDetectedFaces: 1
    },
    barcode: {},
    text: {}
}
async function detectShape( image, type ) {
    const bitmap = await createImageBitmap( image ); // 2
    const detector = new window[ getDetectorName( type ) ]( options[ type ] ); //3
    const detected = await detector.detect( bitmap ); // 6

    return detected; // 7
}

function getDetectorName( type ) {
    return `${ type[ 0 ].toUpperCase() }${ type.substring( 1 ) }Detector`; // 4
}

export default detectShape; // 1

There is only one export in this file, the default one: detectShape() (1). This function converts the passed file (as a File instance) to the needed ImageBitmap using the createImageBitmap() global function (2). Then an appropriate detector is created (3).

The constructor name is derived from the type parameter. Its first letter is changed to the upper-case and the Detector suffix is added (4).

There is also an object containing options for every type of detector (5). Both the barcode and text detector will use the default options, however, for the face detector, there are two options:

fastMode – Switches on less accurate detection (that will recognize more faces but also increase the number of false positives).
maxDetectedFaces – Set to 1 to detect only one face.

After creating the shape detector, you can call its detect() method and await results (6). When the results arrive, return them (7).

Running the application

Coding is complete, however, the application will not work correctly if you start it directly from the directory. This is caused mainly by the fact that the code uses ES modules that are bound by CORS rules. There are two solutions to these issues:

Switch back to old, non-module JavaScript — you will not be cool anymore.
Use a local web server to serve the site — you will still be cool.

Fortunately, using a local web server is as simple as running the following command inside the directory with the application:

npx http-server ./

It will download and run the http-server npm package. You can then navigate to http://localhost:8080 (or to another address that will be displayed in your terminal) and test your own barcode, text and face detector application. Remember to use Chrome with the experimental Web platform features enabled!

And that's it! With the new Shape Detection APIs, it is fairly easy to detect certain shapes in the image — at least in Chrome. We will need to wait and see if other browsers will follow.

Source code and demo

The complete code of the application is available on GitHub. There is also a slightly enhanced and styled live text, barcode and face detection demo available for you to play with. Its source code is also available on GitHub. Unfortunately, at the time of writing this article, shape detecting is not supported on Linux.

As for the next steps, one of the most important applications of face detection is facial recognition. This technology matches human faces detected on images or video frames against a database of faces. As other biometric technologies, it can be used to authenticate users, interact with computers, smartphones or other robotic systems, automatically index images or for video surveillance purposes.