cross-post from medium.com
Veremin is the brainchild of johncohnvt, from the MIT-IBM Watson AI Lab, who built the first rough prototype. I was then able to whip it into something that really worked!
The application attaches to the video stream from your web camera. PoseNet is used to capture the location of your hands within the video. The location then gets converted to music.
Thanks to the magic of TensorFlow.js, Veremin lives 100% in the browser and works on all modern browsers (Chrome, Safari, Firefox, IE) and platforms (OS X, iOS, Android, Windows).
Just point your browser to ibm.biz/veremin on your desktop, laptop, tablet, or phone. Allow the application to use the camera when prompted and make sure the volume is up.
Stand in front of your devices camera and adjust your position so your torso fits the screen . Adjust your stance so you are centered on the vertical red line in the center of the screen and your waist is roughly even with the horizontal red line . You should see the stick version of your form in blue. Now, move both your hands above the red horizontal line. Move your right hand up and down to control the pitch and your left hand left and right to control the volume.
Now just get jiggy with it! ┌(・⌣・)┘♪
If you’re interested, you can adjust some of the parameters by clicking the settings icon in the top right of the screen. You can read more about the various control options here.
If you’re feeling even more adventurous, Veremin can also be used as a MIDI controller. To do that, you must use a browser that supports MIDI output (e.g., Chrome).
Plugin in your MIDI device to your computer and launch Veremin in your browser. Then click the settings symbol in the upper right of the screen and change the Output Device to point to your MIDI output device. You should now be able to control your MIDI device which can be anything from a simple software synthesizer (e.g., SimpleSynth) to a MIDI controlled Tesla Coil (like John uses).
Let’s quickly review all the technologies we use.
While you can use TensorFlow.js to build and train models, the real fun comes from finding new and creative ways to interact with existing pre-trained machine learning models, like PoseNet.
The TensorFlow.js version of PoseNet allows for real-time human pose estimation in the browser. An image is passed to the model and it returns a prediction. The prediction contains a list of keypoints (i.e., right eye, left wrist, etc.) and their confidence scores. What you do with this information is left up to your imagination.
The Web MIDI API allows connections to MIDI input and output devices from browsers. From the connected devices, MIDI messages can be sent or received. The MIDI message (e.g. [128, 72, 64]) is an array of three values corresponding to [command, note, velocity].
MIDI messages are received only from input devices (e.g., keyboard). And can be sent only to outputdevices (e.g., speakers). To request access to MIDI devices (and receive a list of connected inputs and outputs) a call must first be made to the requestMIDIAccess function.
With the Web Audio API, browsers can create sounds or work with recorded sounds. It describes a high-level API for processing and synthesizing audio in web applications.
All audio operations must occur within an AudioContext. Audio modules (i.e., AudioNodes) are created from the AudioContext and chained together to define the audio processing graph.
Working with the Web Audio API can be tricky at times. But to make it easier check out Tone.js, a Web Audio framework for creating interactive music in the browser.
If you’re interested in the nitty gritty, head over the Veremin GitHub repository to check out the full code and learn more. The README includes instructions for deploying your own Veremin or to try it out without installing anything, visit ibm.biz/veremin.
We hope you enjoy Veremin. Please let us know what you think and share some of the beautiful music you make!
This article was written in collaboration with John Cohn (IBM Fellow at the MIT-IBM Watson AI Lab).