Miguel Ángel Cabrera Miñagorri

Posted on Dec 19, 2023

Playing a piano with your eyes - Gaze estimation

#computervision #machinelearning #programming #ai

This article guides you through how to use Pipeless to play a virtual Piano of 8 notes with your eyes just by looking to the notes.

A step by step execution guide is available at: https://www.pipeless.ai/docs/v1/examples/gaze-piano

The code is available in the Pipeless GitHub repository: https://github.com/pipeless-ai/pipeless/tree/main/examples/yolo

Splitting the steps into Pipeless hooks

Before starting to deep dive in the logic, we have to think about how to compose this use case with Pipeless.

Remember that with Pipeless we just need to implement some hook functions and it takes care of everything else by invoking our functions at the proper time with each frame.

In this case, we will implement a custom processing hook. Our processing hook will take the frame and produce an output that consists of 2 points (P1, P2) representing the direction of the person gaze. P1 is the center of the left eye and P2 is the direction of the gaze.

We will also implement a post-processing hook, which takes the output of processing and calculates the sound we have to play based on the gaze direction and also draws our piano over the frame.

Finally, we will initialize a context with the face mesh MediaPipe module. This allows us to initialize the library just once and read that instance from all the invocations of our hooks.

Now, we are ready to go deeper on the actual code logic.

Detecting the eyes and gaze direction

To perform the detection of the eyes we used face mesh module of the Google MediaPipe library. This module creates a mesh of points over the faces it recognizes. Our work here is to process the points of that mesh to estimate the point that the person is looking to.

To get the gaze direction, we are interested in a point P2 that, along with the left eye center (P1) gives us the gaze direction (the line that joins those points).

To do that, we have to project the 2D point from the pupil into the 3D model, using an estimation of the camera matrix and the center of the eye ball in the 3D model. For this task, we use the OpenCV estimateAffine3D function. This function gives us the transformation that we need to apply to the center of the left eye to later obtain the gaze point in the 3D world. After projecting that point back to 2D we have P2.

To maintain the article clean I will not paste the whole code here. The function defining that code can be found at: https://github.com/pipeless-ai/pipeless/blob/c4db6b740e655189be44d182f9a689c11af94211/examples/gaze-piano/process.py#L18

Setting up the piano

For the piano, we will create 8 sections, one per each musical note. These sections will be created in a radial manner, being the center of the circle the left eye. In other words, we will create a circle around the left eye with 8 slices, each slice will correspond to a different sound.

Then, to know which note we must play, we calculate the gaze direction angle. Like commented above, we use the center of the left eye as x=0,y=0 for our coordinates.

The calculation is fairly simple. From our previous step we got the eye center (P1) and the estimated gaze direction point (P2). Using basic trigonometry we calculate the angle as atan((P2y - P1y) / (P2x - P1x)). This returns a value between pi/2 and -pi/2. When the angle is negative, we add 2*pi to make it positive.
Then, we just need to compare the angle with the sections we defined and play the sound for that section:

# Frecuency of each note to generate the tone
notes = [261.63, 293.66, 329.63, 349.23, 392.00, 440.00, 493.88, 523.25]
if angle > 15*math.pi/8 and angle < math.pi/8:
   play_sound(notes[6], note_duration)
elif angle > math.pi/8 and angle < 3*math.pi/8:
   play_sound(notes[5], note_duration)
elif angle > 3*math.pi/8 and angle < 5*math.pi/8:
   play_sound(notes[4], note_duration)
elif angle > 5*math.pi/8 and angle < 7*math.pi/8:
   play_sound(notes[3], note_duration)
elif angle > 7*math.pi/8 and angle < 9*math.pi/8:
   play_sound(notes[2], note_duration)
elif angle > 9*math.pi/8 and angle < 11*math.pi/8:
   play_sound(notes[1], note_duration)
elif angle > 11*math.pi/8 and angle < 13*math.pi/8:
   play_sound(notes[0], note_duration)
elif angle > 13*math.pi/8 and angle < 15*math.pi/8:
   play_sound(notes[7], note_duration)

Generating the sounds

To generate our sounds we use the Python simpleaudio package. The following is the whole function that we use to play the sound. Note that some beeps will appear because we are not performing any kind of zero crossing technique to smooth the tone change.

def play_sound(note, duration):
    sample_rate = 44100  # Hz
    samples = (32767 * 0.5 * np.sin(2.0 * np.pi * np.arange(sample_rate * duration) * note / sample_rate)).astype(np.int16)
    wave_obj = sa.WaveObject(samples, 1, 2, sample_rate)
    play_obj = wave_obj.play()
    play_obj.wait_done()

Run the example

To fetch the whole code and run the example simple execute the following commands:

Install Pipeless:

curl https://raw.githubusercontent.com/pipeless-ai/pipeless/main/install.sh | bash

Create a project:

pipeless init my-project
cd my-project

Download the code:

wget -O - https://github.com/pipeless-ai/pipeless/archive/main.tar.gz | tar -xz --strip=2 "pipeless-main/examples/gaze-piano"

Start Pipeless:

pipeless start --stages-dir .

Provide a stream from your webcam:

pipeless add stream --input-uri "v4l2" --output-uri "screen" --frame-path "gaze-piano"

You can provide streams from any source including file://, https://, rtsp://, rtmp://, etc.

Conclusions

As you can see, just by creating a couple code functions for Pipeless we created a computer vision application that has utility. You can apply what we did here for many other use cases involving gaze detection. Pipeless on its side allows you to easily deploy the applications easily to any device or cloud and makes it really simple to manage streams effortlessly.

Star the Pipeless repository on GitHub to support our work!

pipeless-ai / pipeless

An open-source computer vision framework to build and deploy apps in minutes without worrying about multimedia pipelines

Pipeless

Easily create, deploy and run computer vision applications.

Check the live demo in the website

Pipeless is an open-source computer vision framework to create and deploy applications without the complexity of building and maintaining multimedia pipelines. It ships everything you need to create and deploy efficient computer vision applications that work in real-time in just minutes.

Pipeless is inspired by modern serverless technologies. It provides the development experience of serverless frameworks applied to computer vision. You provide some functions that are executed for new video frames and Pipeless takes care of everything else.

You can easily use industry-standard models, such as YOLO, or load your custom model in one of the supported inference runtimes. Pipeless ships some of the most popular inference runtimes, such as the ONNX Runtime, allowing you to run inference with high performance on CPU or GPU out-of-the-box.

You can deploy your Pipeless application to edge…

View on GitHub

DEV Community

Playing a piano with your eyes - Gaze estimation

Splitting the steps into Pipeless hooks

Detecting the eyes and gaze direction

Setting up the piano

Generating the sounds

Run the example

Conclusions

pipeless-ai / pipeless

An open-source computer vision framework to build and deploy apps in minutes without worrying about multimedia pipelines

Pipeless

Top comments (0)

Read next

Brisa 0.2.0 Release Notes

Using AI for Real-Time Customer Sentiment Tracking

Exploratory Testing: A Detailed Guide

Progressive Web Applications: Your Ultimate Guide to Modern Web Development