DEV Community

Cover image for Funny Hat Day! πŸ‘’πŸŽ© How to do face detection with your webcam and JavaScript πŸ“ΈπŸ§ 
Pascal Thormeier
Pascal Thormeier

Posted on

Funny Hat Day! πŸ‘’πŸŽ© How to do face detection with your webcam and JavaScript πŸ“ΈπŸ§ 

(Cover image created with Dall-E mini and the caption "an AI wearing a funny top hat" - you know, because we're doing machine learning stuff today.)

It's been a while since my last post. I'm working on something rather large; you can expect some news soon!

But today, we'll have a look at you. Yes, you. Specifically, your beautiful faces. We'll make you wear hats. We'll use face-api.js and the Media Stream API for that.

Don't worry, though. Nothing will be processed in the cloud or anywhere outside your machine, you'll keep your images, and everything happens in your browser.

Let's get started!

Boilerplate

First, we'll need some HTML: A <video> element, a hat, two buttons for starting and stopping the video, and two <select> for selecting a hat and the device. You know, you might have two webcams.

<div class="container">
  <div id="hat">
    🎩
  </div>
  <!-- autoplay is important here, otherwise it doesn't immediately show the camera input. -->
  <video id="video" width="1280" height="720" autoplay></video>
</div>

<div>
  <label for="deviceSelector">
    Select device
  </label>
  <select id="deviceSelector"></select>
</div>

<div>
  <label for="hatSelector">
    Select hat
  </label>
  <select id="hatSelector"></select>
</div>

<button id="start">
  Start video
</button>

<button id="stop">
  Stop video
</button>
Enter fullscreen mode Exit fullscreen mode

Next, some CSS for the positioning of the hat:

#hat {
  position: absolute;
  display: none;
  text-align: center;
}
#hat.visible {
  display: block;
}
.container {
  position: relative;
}
Enter fullscreen mode Exit fullscreen mode

Awesome. Next, we install face-api.js with npm and create an index.js file for us to work in:

npm i face-api.js && touch index.js
Enter fullscreen mode Exit fullscreen mode

And lastly for the boilerplate, we select all elements we need from the HTML:

/**
 * All of the necessary HTML elements
 */
const videoEl = document.querySelector('#video')
const startButtonEl = document.querySelector('#start')
const stopButtonEl = document.querySelector('#stop')
const deviceDropdownEl = document.querySelector('#deviceSelector')
const hatSelectorEl = document.querySelector('#hatSelector')
const hatEl = document.querySelector('#hat')
Enter fullscreen mode Exit fullscreen mode

Awesome. Let's get to the fun part.

Accessing the webcam

To access a webcam, we will use the Media Stream API. This API allows us to access video and audio devices, but we're only interested in video devices. Also, we'll cache those devices in a global variable to not have to fetch them again. So let's have a look:

const listDevices = async () => {
  if (devices.length > 0) {
    return
  }

  devices = await navigator.mediaDevices.enumerateDevices()
  // ...
}
Enter fullscreen mode Exit fullscreen mode

The mediaDevices object lets us access all devices, both video and audio. Each device is an object of either the class InputDeviceInfo or MediaDeviceInfo. These objects both roughly look like this:

{
  deviceId: "someHash",
  groupId: "someOtherHash"
  kind: "videoinput", // or "audioinput"
  label: "Some human readable name (some identifier)"
}
Enter fullscreen mode Exit fullscreen mode

The kind is what's interesting to us. We can use that to filter for all videoinput devices, giving us a list of available webcams. We will also add these devices to the <select> we've added in the boilerplate and mark the first device we encounter as the selected one:

/**
 * List all available camera devices in the select
 */
let selectedDevice = null

let devices = []

const listDevices = async () => {
  if (devices.length > 0) {
    return
  }

  devices = (await navigator.mediaDevices.enumerateDevices())
    .filter(d => d.kind === 'videoinput')

  if (devices.length > 0) {
    deviceDropdownEl.innerHTML = devices.map(d => `
      <option value="${d.deviceId}">${d.label}</option>
    `).join('')

    // Select first device
    selectedDevice = devices[0].deviceId
  }
}
Enter fullscreen mode Exit fullscreen mode

Now, we'll actually show the webcam input to the user. For that, the Media Stream API offers the getUserMedia method. It receives a config object as an argument that defines what exactly we want to access how. We don't need any audio, but we need a video stream from the selectedDevice. We can also tell the API our preferred video size. Finally, we assign the output of this method to the <video>, namely its srcObject:

const startVideo = async () => {
  // Some more face detection stuff later

  videoEl.srcObject = await navigator.mediaDevices.getUserMedia({
    video: {
      width: { ideal: 1280 },
      height: { ideal: 720 },
      deviceId: selectedDevice,
    },
    audio: false,
  })

  // More face detection stuff later
}
Enter fullscreen mode Exit fullscreen mode

That should do the trick. Since the <video> has an autoplay attribute, it should immediately show what the cam sees. Unless we didn't allow the browser to access the cam, of course. But why wouldn't we, right? After all, we want to wear hats.

If the hat-wearing is getting a bit too spooky, we would also like to stop the video. We can do that by first stopping each source object's track individually and then clearing the srcObject itself.

const stopVideo = () => {
  // Some face detection stuff later on

  if (videoEl.srcObject) {
    videoEl.srcObject.getTracks().forEach(t => {
      t.stop()
    })
    videoEl.srcObject = null
  }
}
Enter fullscreen mode Exit fullscreen mode

Now we can start and stop the video. Next up:

Doing the face recognition

Let's get the machine learning in. During the boiler plating, we installed face-api.js, which is a pretty fantastic lib to do all kinds of ML tasks regarding face recognition, detection and interpretation. It can also detect moods, tell us where different parts of the face, such as the jawline or the eyes, are and is capable of using different model weights. And the best part: It doesn't need any remote service; we only need to provide the correct model weights! Given, these can be rather large, but we only need to load them once and can do face recognition for the rest of the session.

First, we need the models, though. The face-api.js repo has all the pre-trained models we need:

  • face_landmark_68_model-shard1
  • face_landmark_68_model-weights_manifest.json
  • ssd_mobilenetv1_model-shard1
  • ssd_mobilenetv1_model-shard2
  • ssd_mobilenetv1_model-weights_manifest.json
  • tiny_face_detector_model-shard1
  • tiny_face_detector_model-weights_manifest.json

We put those in a folder called model and make face-api load these:

let faceApiInitialized = false

const initFaceApi = async () => {
  if (!faceApiInitialized) {
    await faceapi.loadFaceLandmarkModel('/models')
    await faceapi.nets.tinyFaceDetector.loadFromUri('/models')

    faceApiInitialized = true
  }
}
Enter fullscreen mode Exit fullscreen mode

The face landmarks are what we need: They represent a box with x and y coordinates, a width and a height value. We could use the facial features to have more precision, but for the sake of simplicity, we'll use the landmarks instead.

With face-api.js, we can create an async function to detect a face in the stream of the video element. face-api.js does all the magic for us, and we only need to tell it in which element we want to look for faces and what model to use. We need to initialize the API first, though.

const detectFace = async () => {
  await initFaceApi()

  return await faceapi.detectSingleFace(videoEl, new faceapi.TinyFaceDetectorOptions())
}
Enter fullscreen mode Exit fullscreen mode

This will return us an object called dd with an attribute called _box. This box contains all kinds of information, namely sets of coordinates for every corner, x and y coordinates of the top-left corner, width and height. To position the box that contains the hat, we need the top, left, width and height attributes. Since every hat emoji is slightly different, we cannot simply put them right over the face - they wouldn't fit.

So, let's add the hats and some way to customize the hats' positioning:

/**
 * All of the available hats
 */
const hats = {
  tophat: {
    hat: '🎩',
    positioning: box => ({
      top: box.top - (box.height * 1.1),
      left: box.left,
      fontSize: box.height,
    }),
  },
  bowhat: {
    hat: 'πŸ‘’',
    positioning: box => ({
      top: box.top - box.height,
      left: box.left + box.width * 0.1,
      width: box.width,
      fontSize: box.height,
    }),
  },
  cap: {
    hat: '🧒',
    positioning: box => ({
      top: box.top - box.height * 0.8,
      left: box.left - box.width * 0.10,
      fontSize: box.height * 0.9,
    }),
  },
  graduationcap: {
    hat: 'πŸŽ“',
    positioning: box => ({
      top: box.top - box.height,
      left: box.left,
      fontSize: box.height,
    }),
  },
  rescuehelmet: {
    hat: '⛑️',
    positioning: box => ({
      top: box.top - box.height * 0.75,
      left: box.left,
      fontSize: box.height * 0.9,
    }),
  },
}
Enter fullscreen mode Exit fullscreen mode

The main reason for

Since we haven't used the <select> for the hats just yet, let's add this next:

let selectedHat = 'tophat'

const listHats = () => {
  hatSelectorEl.innerHTML = Object.keys(hats).map(hatKey => {
    const hat = hats[hatKey]

    return `<option value="${hatKey}">${hat.hat}</option>`
  }).join('')
}
Enter fullscreen mode Exit fullscreen mode

How to wear hats

Now we can start glueing things together. With the selectedHat variable and the box, we can now position the selected hat on the detected face:

/**
 * Positions the hat by a given box
 */
const positionHat = (box) => {
  const hatConfig = hats[selectedHat]
  const positioning = hatConfig.positioning(box)

  hatEl.classList.add('visible')
  hatEl.innerHTML = hatConfig.hat
  hatEl.setAttribute('style', `
    top: ${positioning.top}px; 
    left: ${positioning.left}px; 
    width: ${box.width}px; 
    height: ${box.height}px; 
    font-size: ${positioning.fontSize}px;
  `)
}
Enter fullscreen mode Exit fullscreen mode

As you can see, we're using CSS for that. Of course, we could paint it with a canvas and whatnot, but CSS makes things more straightforward and less laggy.

Now we need to integrate the face detection into the startVideo and the stopVideo functions. I'll show the entire code of these functions here for completeness.

/**
 * Start and stop the video
 */
let faceDetectionInterval = null

const startVideo = async () => {
  listHats()
  await listDevices()

  stopVideo()

  try {
    videoEl.srcObject = await navigator.mediaDevices.getUserMedia({
      video: {
        width: { ideal: 1280 },
        height: { ideal: 720 },
        deviceId: selectedDevice,
      },
      audio: false
    })

    faceDetectionInterval = setInterval(async () => {
      const positioning = await detectFace()

      if (positioning) {
        positionHat(positioning._box)
      }
    }, 60)
  } catch(e) {
    console.error(e)
  }
}

const stopVideo = () => {
  clearInterval(faceDetectionInterval)
  hatEl.classList.remove('visible')

  if (videoEl.srcObject) {
    videoEl.srcObject.getTracks().forEach(t => {
      t.stop()
    })
    videoEl.srcObject = null
  }
}
Enter fullscreen mode Exit fullscreen mode

As you can see, we're using an interval here to position everything. Due to the nature of face detection, it would be way too jiggly if we did it any more frequent. It already is quite jiggly, but around 60ms makes it at least bearable.

Last, we add some event listeners, and we're good to go:

/**
 * Event listeners
 */
startButtonEl.addEventListener('click', startVideo)

stopButtonEl.addEventListener('click', stopVideo)

deviceDropdownEl.addEventListener('change', e => {
  selectedDevice = e.target.value
  startVideo()
})

hatSelectorEl.addEventListener('change', e => {
  selectedHat = e.target.value
})
Enter fullscreen mode Exit fullscreen mode

The result

And here's the result:

Depending on your system, the hats may very well be off because every system renders emojis differently. Also, give it a second to actually load the model weights, they take a few seconds. For best results, view on a large screen and open the sandbox in a new tab. Obviously, the tab needs camera access.

If you'd like, how about sharing a screenshot of you wearing your favorite hat emoji in the comments?


I hope you enjoyed reading this article as much as I enjoyed writing it! If so, leave a ❀️ or a πŸ¦„! I write tech articles in my free time and like to drink coffee every once in a while.

If you want to support my efforts, you can offer me a coffee β˜• or follow me on Twitter 🐦! You can also support me directly via Paypal!

Buy me a coffee button

Top comments (9)

Collapse
 
orliesaurus profile image
orliesaurus

Love this Pascal, dropping you a follow!

Collapse
 
thormeier profile image
Pascal Thormeier

Thank you so much! Followed back :)

Collapse
 
miketalbot profile image
Mike Talbot ⭐

Genius stuff. Love it.

Collapse
 
thormeier profile image
Pascal Thormeier

Thank you so much! Glad you liked it 😊

Collapse
 
yongchanghe profile image
Yongchang He

That's interesting!

Collapse
 
thormeier profile image
Pascal Thormeier

Glad you liked it!

Collapse
 
nandantyagi profile image
Nandan Tyagi

Good stuff! Thanks!

Collapse
 
thormeier profile image
Pascal Thormeier

You're welcome! Glad you liked it!

Collapse
 
andrewbaisden profile image
Andrew Baisden

Wow im impressed.