For the past year, I've been learning more about machine learning. I've built a few browser experiments but lately, I've been spending some time mixing Machine Learning with another passion of mine, hardware!
The following tutorial is about how I prototyped a gesture recognition system using an Arduino and Tensorflow.js.
This is only a first version and is still very experimental.
What follows covers the main steps I took to build this, with some code samples.
If you want to have a look at the whole code, you can check the Github repo but know that I am going to change quite a bit of it in the next few months.
I believe that the most important thing to understand is the steps to take, rather than all the code needed.
Demo
This prototype is about training a machine learning model to recognise body movements like "punch" or "hadoken" to interact with a web game of street fighter.
The end result looks like this:
This project is inspired by a similar one by Minko Gechev using the webcam
Material needed
To build this, we need some hardware. What I used for my prototype includes:
- Arduino MKR1000 (another model can work too)
- Accelerometer/gyroscope (MPU6050)
- Button
- Jumper wires
- Battery
I also used a breadboard to put everything together but if you decide to solder it, you'll probably need a protoboard instead.
In terms of tech stack, I used:
- Vanilla Javascript
- Tensorflow.js
- Node.js
- Johnny five
- Web sockets
Step 1: Gathering data
If we start from scratch, we just have our idea: playing street fighter with our body movements. Now, we need to think about how we're gonna make that happen...
To be able to build a classifier, we need data. This data is gonna be coming from some hardware we're gonna be holding in our hand.
To get the data, we need to start by assembling our components together.
It should look something like this:
The micro-controller I used is an Arduino MKR1000. I picked this model because I already had it at home and it had built-in wifi, which meant I wouldn't have to be tethered to my laptop to record gestures. You could also try with an Arduino Uno but you would have to be tethered to your laptop all the time, which is not ideal for this particular prototype, but it would still be useful to get started.
The second main part is the accelerometer/gyroscope. I used an MPU6050 that allows you to get acceleration data on the x, y and z axis, as well as rotation data on the x, y and z axis, giving you a total of 6 points of data.
Finally, I also used a button because I wanted to record data only when I was performing a certain gesture. This way, I could record data only when I am pressing the button and performing a "punch" for example.
Now that we have assembled our hardware, we need to write the code to get this data.
To do this, I used the Johnny-Five framework to communicate between my computer and the Arduino in JavaScript.
The code looks something like this:
const EtherPortClient = require("etherport-client").EtherPortClient;
const five = require('johnny-five');
const fs = require('fs');
const board = new five.Board({
port: new EtherPortClient({
host: "192.168.1.113", //Your Arduino IP goes here
port: 3030
}),
timeout: 1e5,
repl: false
});
board.on("ready", function() {
const button = new five.Button("A0");
let stream = fs.createWriteStream(`data/sample_punch_0.txt`, {flags: 'a'});
const imu = new five.IMU({
pins: [11,12], // connect SDA to 11 and SCL to 12
controller: "MPU6050"
});
imu.on("data", function() {
let data = `${this.accelerometer.x} ${this.accelerometer.y} ${this.accelerometer.z} ${this.gyro.x} ${this.gyro.y} ${this.gyro.z}`;
button.on("hold", () => stream.write(`${data} \r\n`));
});
button.on("release", () => stream.end());
});
In the code sample above, we start by requiring the Node.js modules we need, we set up our board with the IP address of our Arduino as well as the port it's gonna be communicating on. Then, when the board is ready, we set up our button, MPU6050 sensor and we create a stream so we can write all our data to a file. When we get data from our sensor, we store it all in a variable and, while we hold our button down, we write this data to the file we declared above.
Finally, when we release the button, we close our stream, meaning we don't write data to this particular file anymore.
This code sample covers how to write data to a file for a single gesture sample, however, for each gesture, we need to record multiple samples, so you would have to modify this file to record punch sample 2, punch sample 3, 4, etc...
After this step of recording gesture data, we need to move on to a second step to be able to use it, this step is data processing.
Step 2: Data processing
At the moment, all we have is a folder full of files with sensor data that should look something like this:
0.40205128205128204 0.019145299145299145 -4.384273504273504 0.06110144116383567 -0.27059209658270084 0.3578798696738946
-0.13401709401709402 -0.5743589743589743 -3.561025641025641 0.008728777309119381 -0.3578798696738946 0.6546582981839536
-1.3210256410256411 -0.47863247863247865 -3.1398290598290597 -0.22694821003710391 -0.026186331927358142 0.8117762897481025
-1.7230769230769232 -0.1723076923076923 -2.9675213675213676 -0.6895734074204312 0.183304323491507 0.20949065541886513
-1.3593162393162392 -0.4211965811965812 -3.024957264957265 -0.9252503947666544 0.21821943272798452 -0.28804965120093956
-1.4167521367521367 -0.5360683760683761 -2.7377777777777776 -0.9601655040031319 0.3229647604374171 -0.1396604369459101
-2.201709401709402 -0.22974358974358974 -2.3165811965811964 -1.0125381678578482 0.45389642007420783 0.1309316596367907
-3.1015384615384614 0.09572649572649572 -1.7996581196581196 -1.1958424913493553 0.6721158528021923 0.06110144116383567
-3.2164102564102564 0.6892307692307692 -1.435897435897436 -1.483892142550295 1.0125381678578482 -0.08728777309119382
-3.407863247863248 1.6464957264957265 -1.1678632478632478 -1.7195691298965181 1.187113714040236 -0.24440576465534267
-3.963076923076923 1.991111111111111 -0.7466666666666667 -1.8766871214606669 1.1347410501855195 -0.21821943272798452
-5.322393162393162 4.1928205128205125 1.1678632478632478 -2.2869396549892778 1.9290597853153832 0.39279497891037213
-5.264957264957265 6.337094017094017 1.9336752136752138 -2.609904415426695 2.3043972096075165 -0.07855899578207443
-4.843760683760684 7.275213675213675 2.508034188034188 -2.8455814027729183 2.356769873462233 -0.8554201762936994
-4.5948717948717945 7.102905982905983 3.063247863247863 -2.976513062409709 2.496430310408143 -1.1521986048037582
-2.1442735042735044 9.649230769230769 3.6184615384615384 -3.4478670371021556 3.1685461632103356 -0.6546582981839536
To be able to use this, we are going to have to read the data from these files, and transform it so it can be used by Tensorflow.js.
1. Read data from files
I'm not going to go through the code to do this, as I think a lot of other blog posts have covered it before and I'm sure you could figure it out if you needed to do it.
The main goal is to go through each data file in our data folder, read line by line and transform our data from the format above, to an array of objects.
What we want is for our data to look something like this:
{ features:
[ -0.11487179487179487, 9.63008547008547, -4.345982905982906, -0.22694821003710391, 0.04364388654559691, 0.5586417477836404, -0.07658119658119658, 9.074871794871795, -4.7671794871794875,0.11347410501855196, 0.08728777309119382, 0.8990640628392963,
-0.7658119658119658, 9.744957264957264, -4.288547008547009, 0.052372663854716284, -0.1309316596367907, 0.7768611805116249, -1.3784615384615384, 9.610940170940172, -3.790769230769231, -0.017457554618238762, -0.2618633192735814, 0.34915109236477526,
-2.4697435897435898, 9.725811965811966, -3.6567521367521367, -0.10474532770943257, -0.17457554618238763, -0.034915109236477525, -3.58017094017094, 9.898119658119658,
-3.9056410256410254, -0.07855899578207443, -0.06983021847295505, -0.296778428510059, -4.7097435897435895, 9.993846153846153, -3.9247863247863246, -0.07855899578207443,
-0.04364388654559691, -0.5411841931654017, -6.04991452991453, 10.08957264957265,
-3.9439316239316238, -0.06110144116383567, 0.034915109236477525,-0.6459295208748342,
... 260 more items ],
label: 1 }
What we're doing here is going from lines in a file called sample_punch_0.txt
to something we can start working with.
The array of features represents our data for a single gesture sample, and our label represents the name of our gesture.
We don't want to be working with strings so if we want to train 3 different gestures, we can have a gesture array of ['hadoken', 'punch', 'uppercut']. In this case, a label of 1 would map to 'punch'.
We need to be doing this for all of our data files though, so in the end, we would have a big array of gesture objects, like this:
[
{ features:
[ -0.11487179487179487, 9.63008547008547, -4.345982905982906, -0.22694821003710391, 0.04364388654559691, 0.5586417477836404, -0.07658119658119658, 9.074871794871795, -4.7671794871794875,0.11347410501855196, 0.08728777309119382, 0.8990640628392963,
... 530 more items ],
label: 1 },
{ features:
[ -0.11487179487179487, 9.63008547008547, -4.345982905982906, -0.22694821003710391, 0.04364388654559691, 0.5586417477836404, -0.07658119658119658, 9.074871794871795, -4.7671794871794875,0.11347410501855196, 0.08728777309119382, 0.8990640628392963,
... 530 more items ],
label: 0 },
{ features:
[ -0.11487179487179487, 9.63008547008547, -4.345982905982906, -0.22694821003710391, 0.04364388654559691, 0.5586417477836404, -0.07658119658119658, 9.074871794871795, -4.7671794871794875,0.11347410501855196, 0.08728777309119382, 0.8990640628392963,
... 530 more items ],
label: 2 },
{ features:
[ -0.11487179487179487, 9.63008547008547, -4.345982905982906, -0.22694821003710391, 0.04364388654559691, 0.5586417477836404, -0.07658119658119658, 9.074871794871795, -4.7671794871794875,0.11347410501855196, 0.08728777309119382, 0.8990640628392963,
... 530 more items ],
label: 2 },
...
]
We've now transformed all our files into objects of labels and features.
However, this is not ready yet to be used with Tensorflow. We need to keep transforming our data into something that the framework can use.
2.Formatting the data
At this stage, we're going to start transforming our objects into 2 arrays. One for the labels, and one for the features.
What we aim for is something like:
// labels
[ [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ],
[ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 ] ]
// features
[
[
[ 5.686153846153847, ... 400 more items ], [ 9.285470085470086,... 200 more items ], ...
],
[
[ 5.686153846153847, ... 400 more items ], [ 9.285470085470086,... 200 more items ], ...
],
[
[ 5.686153846153847, ... 400 more items ], [ 9.285470085470086,... 200 more items ], ...
],
]
With the format above, we are separating labels and features but they are still mapped to each other. What I mean is that the 1st layer of the labels array represents all the gestures with a label of 0 ("hadoken" for example), and the 1st layer of the features array represents all the data for our hadoken gestures.
Again, I'm not showing code on how to do this because, so far, it has nothing to do with Tensorflow.js specifically. All we did is transform data from lines in a file, to objects, to multidimensional arrays using JavaScript array methods.
Now, we're really close to something Tensorflow can work with, except that the framework works with a special data structure called Tensors.
3. Converting to tensors
This is where we start using specific Tensorflow.js code. Using built-in methods, we're going to transform our arrays into tensors.
To do this, here's a code sample:
function convertToTensors(featuresData, labelData) {
// we start by shuffling our data so our model doesn't get used to the way we feed it data.
const [shuffledFeatures, shuffledLabels] = shuffleData(featuresData, labelData);
// numSamplesPerGesture is the number of times we trained a single gesture (e.g. we trained the "punch" gesture 20 times)
// totalNumDataPerFile is the number of data points we take into consideration per gesture. If we only consider the first 50 lines of a data file, 50 * 6 points of data = 300;
const featuresTensor = tf.tensor2d(shuffledFeatures, [numSamplesPerGesture, totalNumDataPerFile]);
// 1D tensor for labels & convert them from the set [0, 1, 2] into one-hot encoding (.e.g., 0 --> [1, 0, 0]).
// e.g: punch at index 0 ⇒ [1,0,0], hadoken at index 1 ⇒ [0,1,0],
const labelsTensor = tf.oneHot(tf.tensor1d(shuffledLabels).toInt(), numClasses);
}
Now we have a tensor for labels and one for features. We're almost ready to train our model! But first, one last step is to split the data between training and testing set.
4. Splitting
Considering our code sample above, we need to split both labels and features tensors into a training and testing set.
The reason why we're doing this is because we want to use about 80% of our set to train the model, and the remaining 20% to validate the predictions of our model.
const numTestExamples = Math.round(numSamplesPerGesture * 0.2); // 20%
const numTrainExamples = numSamplesPerGesture - numTestExamples; // 80%
// Split between training set and test set.
const trainingFeatures = featuresTensor.slice([0, 0], [numTrainExamples, totalNumDataPerFile]);
const testingFeatures = featuresTensor.slice([numTrainExamples, 0], [numTestExamples, totalNumDataPerFile]);
const trainingLabels = labelsTensor.slice([0, 0], [numTrainExamples, numClasses]);
const testingLabels = labelsTensor.slice([numTrainExamples, 0], [numTestExamples, numClasses]);
return [trainingFeatures, trainingLabels, testingFeatures, testingLabels];
Now that we have our training and testing tensors for both labels and features, we're ready to create our model.
Training the model
Creating the model is a step that's a bit more experimental than the previous ones. Your model could be built in a lot of different ways and you can play around with parameters, the numbers of layers in your neural network, the number of epochs (steps) you want to go through, etc...
There is not a set way to create the right model. As you change parameters, you should see a change in the accuracy and predictions of your model and you can decide to stop tweaking once you get to a level of accuracy you're happy with.
My current model is created this way:
const createModel = async (trainingFeatures, trainingLabels, testFeatures, testLabels) => {
const params = { learningRate: 0.1, epochs: 40 };
const model = tf.sequential();
model.add(tf.layers.dense({ units: 10, activation: 'sigmoid', inputShape: [trainingFeatures.shape[1]] }));
model.add(tf.layers.dense({ units: 3, activation: 'softmax' }));
const optimizer = tf.train.adam(params.learningRate);
model.compile({
optimizer: optimizer,
loss: 'categoricalCrossentropy',
metrics: ['accuracy'],
});
await model.fit(trainingFeatures, trainingLabels, {
epochs: params.epochs,
validationData: [testFeatures, testLabels],
});
await model.save(’file://model’);
}
The last line of this code sample saves the model as a file in your application. This way, you can use it for the last step, predicting new samples of data!
Using the model for predictions
Now that our training process is done, our model is ready to be used to classify new samples between "hadoken", "punch" and "uppercut".
const tf = require('@tensorflow/tfjs-node');
let liveData = [];
let model;
const gestureClasses = ['hadoken', 'punch', 'uppercut'];
const init = async () => {
model = await tf.loadLayersModel('file://model/model.json');
}
// similar step to the recording process
imu.on("data", function() {
button.on("hold", () => {
let data = {xAcc: this.accelerometer.x, yAcc: this.accelerometer.y, zAcc: this.accelerometer.z,
xGyro: this.gyro.x, yGyro: this.gyro.y, zGyro: this.gyro.z};
if (liveData.length < numValuesExpected){
liveData.push(data.xAcc, data.yAcc, data.zAcc, data.xGyro, data.yGyro, data.zGyro)
}
});
button.on("release", function(){
predict(model, liveData);
liveData = [];
});
});
const predict = (model, newSampleData) => {
tf.tidy(() => {
// the live data we get is just an array of numbers. We also need to transform it to a tensor so the model can use it.
const input = tf.tensor2d([newSampleData], [1, 300]);
const prediction = model.predict(input);
// the prediction will come back as an index, the same as our labels in our data set.
const gesturePredicted = gestureClasses[prediction.argMax(-1).dataSync()[0]];
console.log(gesturePredicted) // either punch, hadoken or uppercut;
});
}
With the code sample above, we get live data while holding the button down and performing one of the multiple gestures we trained. Once we release the button, we run our predict
function with this new sample the model has never seen before. We get back an index we can use in our gestureClasses
array to get the predicted gesture.
And we're done! 🎉
Extras
As I said at the beginning of this tutorial, the most important thing is to understand the steps you'd need to go through if you wanted to build something similar. If you don't understand the code entirely, it's totally ok!
A cool thing to know is that, to start with, you need a way to get data but this doesn't have to involve an Arduino. I built a version of this project using a Daydream controller and also... a mobile phone!
Most modern phones have a built-in accelerometer and gyroscope you can use to gather data for this type of experiments. The code would have to change a little because, instead of Johnny-Five, you'd need to use the Generic Sensor API, that would look something like:
let gyroscope = new Gyroscope({frequency: 60});
gyroscope.addEventListener('reading', e => {
// gyroscope.x;
// gyroscope.y;
// gyroscope.z;
});
gyroscope.start();
let accelerometer = new Accelerometer({frequency: 60});
accelerometer.addEventListener('reading', e => {
// accelerometer.x;
// accelerometer.y;
// accelerometer.z;
});
accelerometer.start();
If you want to try it out, a demo is available here and you can find the code in this repo.
I might write another post later as I improve the code little by little and eventually build other experiments :)
Thanks for reading! 💚
Top comments (19)
This is very cool. I reminds me of a "computer mouse" I made in college using an arduino, MPU6050, and EMG sensor. I was all about HMI devices and the different ways they could be made back then. Good job!
This is really cool and easy to follow ! Thanks for sharing your work !
Very cool.
Good luck with Dhalsim's gestures. 😃
Eslint wont approve haha
This looks super cool! I've worked on johnnyfive before for one of my projects and I totally loved it also its always great to see how javascript can do almost everything
This is so sick!!!!
Awesome.
This is awesome
I remember this this kind of game back in Gameworks.
I've seen and played many bootleg versions of SF2, but this one has the craziest ShoRyuKen ever!! 👏🏾
Gonna look at this in more depth 🤓, love it!!
I'd recommend looking into time warping longest common subsequence for real-time motion recognition :)