DEV Community

Miguel Ángel Cabrera Miñagorri

Posted on

Playing a piano with your eyes - Gaze estimation

This article guides you through how to use Pipeless to play a virtual Piano of 8 notes with your eyes just by looking to the notes.

A step by step execution guide is available at: https://www.pipeless.ai/docs/v1/examples/gaze-piano

The code is available in the Pipeless GitHub repository: https://github.com/pipeless-ai/pipeless/tree/main/examples/yolo

Splitting the steps into Pipeless hooks

Before starting to deep dive in the logic, we have to think about how to compose this use case with Pipeless.

Remember that with Pipeless we just need to implement some hook functions and it takes care of everything else by invoking our functions at the proper time with each frame.

In this case, we will implement a custom processing hook. Our processing hook will take the frame and produce an output that consists of 2 points (P1, P2) representing the direction of the person gaze. P1 is the center of the left eye and P2 is the direction of the gaze.

We will also implement a post-processing hook, which takes the output of processing and calculates the sound we have to play based on the gaze direction and also draws our piano over the frame.

Finally, we will initialize a context with the face mesh MediaPipe module. This allows us to initialize the library just once and read that instance from all the invocations of our hooks.

Now, we are ready to go deeper on the actual code logic.

Detecting the eyes and gaze direction

To perform the detection of the eyes we used face mesh module of the Google MediaPipe library. This module creates a mesh of points over the faces it recognizes. Our work here is to process the points of that mesh to estimate the point that the person is looking to.

To get the gaze direction, we are interested in a point P2 that, along with the left eye center (P1) gives us the gaze direction (the line that joins those points).

To do that, we have to project the 2D point from the pupil into the 3D model, using an estimation of the camera matrix and the center of the eye ball in the 3D model. For this task, we use the OpenCV `estimateAffine3D` function. This function gives us the transformation that we need to apply to the center of the left eye to later obtain the gaze point in the 3D world. After projecting that point back to 2D we have P2.

To maintain the article clean I will not paste the whole code here. The function defining that code can be found at: https://github.com/pipeless-ai/pipeless/blob/c4db6b740e655189be44d182f9a689c11af94211/examples/gaze-piano/process.py#L18

Setting up the piano

For the piano, we will create 8 sections, one per each musical note. These sections will be created in a radial manner, being the center of the circle the left eye. In other words, we will create a circle around the left eye with 8 slices, each slice will correspond to a different sound.

Then, to know which note we must play, we calculate the gaze direction angle. Like commented above, we use the center of the left eye as x=0,y=0 for our coordinates.

The calculation is fairly simple. From our previous step we got the eye center (P1) and the estimated gaze direction point (P2). Using basic trigonometry we calculate the angle as `atan((P2y - P1y) / (P2x - P1x))`. This returns a value between `pi/2` and `-pi/2`. When the angle is negative, we add `2*pi` to make it positive.
Then, we just need to compare the angle with the sections we defined and play the sound for that section:

``````# Frecuency of each note to generate the tone
notes = [261.63, 293.66, 329.63, 349.23, 392.00, 440.00, 493.88, 523.25]
if angle > 15*math.pi/8 and angle < math.pi/8:
play_sound(notes[6], note_duration)
elif angle > math.pi/8 and angle < 3*math.pi/8:
play_sound(notes[5], note_duration)
elif angle > 3*math.pi/8 and angle < 5*math.pi/8:
play_sound(notes[4], note_duration)
elif angle > 5*math.pi/8 and angle < 7*math.pi/8:
play_sound(notes[3], note_duration)
elif angle > 7*math.pi/8 and angle < 9*math.pi/8:
play_sound(notes[2], note_duration)
elif angle > 9*math.pi/8 and angle < 11*math.pi/8:
play_sound(notes[1], note_duration)
elif angle > 11*math.pi/8 and angle < 13*math.pi/8:
play_sound(notes[0], note_duration)
elif angle > 13*math.pi/8 and angle < 15*math.pi/8:
play_sound(notes[7], note_duration)
``````

Generating the sounds

To generate our sounds we use the Python simpleaudio package. The following is the whole function that we use to play the sound. Note that some beeps will appear because we are not performing any kind of zero crossing technique to smooth the tone change.

``````def play_sound(note, duration):
sample_rate = 44100  # Hz
samples = (32767 * 0.5 * np.sin(2.0 * np.pi * np.arange(sample_rate * duration) * note / sample_rate)).astype(np.int16)
wave_obj = sa.WaveObject(samples, 1, 2, sample_rate)
play_obj = wave_obj.play()
play_obj.wait_done()
``````

Run the example

To fetch the whole code and run the example simple execute the following commands:

• Install Pipeless:
``````curl https://raw.githubusercontent.com/pipeless-ai/pipeless/main/install.sh | bash
``````
• Create a project:
``````pipeless init my-project
cd my-project
``````
``````wget -O - https://github.com/pipeless-ai/pipeless/archive/main.tar.gz | tar -xz --strip=2 "pipeless-main/examples/gaze-piano"
``````
• Start Pipeless:
``````pipeless start --stages-dir .
``````
• Provide a stream from your webcam:
``````pipeless add stream --input-uri "v4l2" --output-uri "screen" --frame-path "gaze-piano"
``````

You can provide streams from any source including `file://`, `https://`, `rtsp://`, `rtmp://`, etc.

Conclusions

As you can see, just by creating a couple code functions for Pipeless we created a computer vision application that has utility. You can apply what we did here for many other use cases involving gaze detection. Pipeless on its side allows you to easily deploy the applications easily to any device or cloud and makes it really simple to manage streams effortlessly.

Pipeless

Easily create, deploy and run computer vision applications.

Pipeless is an open-source computer vision framework to create and deploy applications without the complexity of building and maintaining multimedia pipelines. It ships everything you need to create and deploy efficient computer vision applications that work in real-time in just minutes.

Pipeless is inspired by modern serverless technologies. It provides the development experience of serverless frameworks applied to computer vision. You provide some functions that are executed for new video frames and Pipeless takes care of everything else.

You can easily use industry-standard models, such as YOLO, or load your custom model in one of the supported inference runtimes. Pipeless ships some of the most popular inference runtimes, such as the ONNX Runtime, allowing you to run inference with high performance on CPU or GPU out-of-the-box.

You can deploy your Pipeless application to edge…