Yuiko Koyanagi

Posted on Aug 3, 2021

Use Pose detection of TensorFlow with Next.js and TypeScript: Let's become pictograms with Pose detection #Tokyo2020

#tensorflow #machinelearning #nextjs #typescript

Hello guys,

Good news! Pictograms are now the official sport in #Tokyo2020.
So, I have developed an application with Pose Detection in TensorFlow.js, that we can become pictograms :D

In this article, I will explain about this application.

DEMO→https://pictogram-san.com/
github→https://github.com/tommy19970714/pictogram-san

※ I developed this application with my friends, Tommy, Waserin, Nishikawa, and mikkame.

Why Pictograms?

Now now the pictograms are official in Olympic Games.

The Tokyo Organising Committee of the Olympic and Paralympic Games (Tokyo 2020) today unveiled the official sport pictograms of the Olympic Games Tokyo 2020

Tokyo 2020 unveils Olympic Games sport pictograms

You want to try pictograms? Just try this application!

Usage

Visit https://pictogram-san.com/, then click Start Game or Take photo.

When you click Start Game, the music starts now!
Let's pose as per the subject! lol

When you click Take photo, You can get a screen shot.

If you want switch a camera between in-camera and rear camera, you can just click the following button! (Only SP is available. When you try on PC, this button is disabled.)

Features

This application has the following functions.

Make a pictogram of the image acquired by the webcam.
Split the screen in two vertically, and play the pictogram on the top and the webcam video on the bottom.
By clicking the play button, the music automatically starts playing.
Switch a camera between in-camera and rear camera
Take a screenshot of the screen when the music ends.

Here, we used a model called pose-detection in TensorFlow.js and this application is built with Next.js and TypeScript.

I can't write about all the technical details, so now I'll just write some points important.

If you need, please check our github.
Any issues and PRs are really welcome!

Set up to use pose-detection with Next.js and TypeScript

$ yarn add @tensorflow-models/pose-detection @tensorflow/tfjs-core @tensorflow/tfjs-converter @tensorflow/tfjs-backend-webgl

※ I wrote in another article about the reason why I don't do just yarn add @tensorflow/tfjs. Have a look if you need.

If the model you use is based on wasm, you must install @tensorflow/tfjs-backend-wasm instead of @tensorflow/tfjs-backend-webgl.
Here, because the pose-detection model is based on webgl, I installed webgl.

Then, load PoseNet.
If necessary, you can specify the architecture, etc.

    const modelName = SupportedModels.PoseNet
    const net = await createDetector(modelName, {
      quantBytes: 2,
      architecture: 'MobileNetV1'
      outputStride: 16,
      inputResolution: resolution,
    })

There are two architectures, MobileNetV1 and ResNet50. ResNet50 gives higher accuracy, but it is quite heavy, especially on a heavy phone, so we used MobileNetV1 here.

After the PoseNet has been loaded, put the image or video data in the argument of estimatePoses.
In this case, we used the webcam data as is.

      const predictions = await net.estimatePoses(webcam, {
        maxPoses: 1,
        flipHorizontal: false,
      })

Then, you can use the detected predictions and draw them as you like.

Start pose detection as soon as the webcam starts up

In order to use the information from the webcam, we need to wait until the video tag is loaded.
So, I wrote like the following.

if (webcamRef.current.video.readyState === 4) {
 // Write the process of using TensorFlow.js
}

However, since the readyState is not 4 when the page is loaded, this processing part will be passed forever.

So, I tried to wait until readyState becomes 4 in the following code.

  const handleLoadWaiting = async () => {
    return new Promise((resolve) => {
      const timer = setInterval(() => {
        if (webcamRef.current?.video?.readyState === 4) {
          resolve(true)
          clearInterval(timer)
        }
      }, 500)
    })
  }

  const handleStartDrawing = async () => {
    await handleLoadWaiting()
    // Here write something
  }

By the way, I had used useEffect in the past my application that puts on the mask on your face.
※ Since it has no loading, you may be confused because the mask is not displayed at first, but if you wait for some seconds, the mask will be attached.

  useEffect(() => {
    runFaceDetect();
  }, [webcamRef.current?.video?.readyState])

// https://github.com/yuikoito/mask-app/blob/master/src/App.tsx#L43-L46

Here, it was better to wait until the loading is completed with handleLoadWaiting function and I used it.

Then don't forget to specify the size of the input element and the size of the output element as well.

      const webcam = webcamRef.current.video as HTMLVideoElement
      const canvas = canvasRef.current
      webcam.width = webcam.videoWidth
      webcam.height = webcam.videoHeight
      canvas.width = webcam.videoWidth
      canvas.height = webcam.videoHeight * 2

At first, because I totally forgot to specify the width of the input webcam element, the width of the input value was recognized as 0, and I was troubled by the error that roi was 0. So, DON'T forget!

      webcam.width = webcam.videoWidth
      webcam.height = webcam.videoHeight

Switch a camera between in-camera and rear camera

To switch between the in-camera and rear camera, simply switch the facingMode between 'user' and 'environment'.

  const [facingMode, setFacingMode] = useState<'user' | 'environment'>('user')
  const videoConstraints = {
    width: // specify width you want,
    height: // specify height you want,
    facingMode: facingMode,
  }

// webcam part
          <Webcam
            audio={false}
            mirrored={true}
            videoConstraints={videoConstraints}
            ref={webcamRef}
          />