A 1080p version of the video is available on Cinnamon
This is a video taken from my weekly show "ML for Everyone" that broadcasts live on Twitch every Tuesday at 2pm UK time.
This is a recording of the first of a series of live coding sessions in which Si Metson and myself attempt to train a machine learning model to recognize and identify the individual modules comprising a modular synthesizer. You are able read all about why we are doing this and what we are attempting to do in the intro post.
In this session we were mostly collating the data and trying to get it into a format suitable to be used for the machine learning model. We were mostly following along this tutorial on the Tensorflow Object Detection API:
https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/
In order to generate training data for the model, we went through the following steps:
Scraped the site Modulargrid to collect the images of each module and its metadata (name, etc).
Wrote a script to generate synthetic synthesizer panels by concatenating
N
(in this case 5) random module images side by side. This way we know the actual co-ordinates of each module in the image. Each image is scaled up to a common height before they are concatenated together.Created Tensorflow
tfrecord
files that contained an efficient binary pack (based on protobufs) of the metadata and the images.
We ran out of time at the end of the stream trying to debug some logic around calculating the co-ordinates. We realised a much easier way to do it after and refactored the code to this way.
The object detection API expects data to be in a certain format, encoded in a tfrecord
instance. This format is uses as it is extremely efficient to process, and when you are needing to ingest large amounts of information into a machine learning pipeline, the data needs to be as fast to load as possible.
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(image.height),
'image/width': dataset_util.int64_feature(image.width),
'image/filename': dataset_util.bytes_feature(filename.encode('utf-8')),
'image/source_id': dataset_util.bytes_feature(filename.encode('utf-8')),
'image/encoded': dataset_util.bytes_feature(open(filename, "rb").read()),
'image/format': dataset_util.bytes_feature( b'jpg'),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
The metadata we scrape from the website looks like this:
{
"id": "29365",
"image": "https://www.modulargrid.net/img/modcache/29365.f.jpg",
"info": [
"CV Modulation",
"Drum",
"Envelope Generator",
"Utility"
],
"size": "10 HP",
"name": "Dual Trig Conditioner",
"manufacturer": "Analog Ordnance",
"description": "Gate to trig + decay env"
},
And the composite 'fake' images we produce look like this:
In running the script we can generate a number of records to use for our training data, and a number to use for testing the accuracy of the trained model
(venv) matt@Matts-MBP modulair % python randomise.py
100%|████████████████████████████████████████| 1000/1000 [01:47<00:00, 9.34it/s]
Successfully created the TFRecord file with 1000 records: train.record
100%|██████████████████████████████████████████| 100/100 [00:10<00:00, 9.10it/s]
Successfully created the TFRecord file with 100 records: test.record
(venv) matt@Matts-MBP modulair %
All the code for this session can be found at:
modulair
These are scripts to fetch images of modules from modular sythesizers and format them for training in an object detection model.
You can read all about this mini-project here:
https://dev.to/hammertoe/using-machine-learning-to-catalog-modular-synthesizers-co2
Next session we'll be actually configuring a training pipeline and training the data.
I hope you enjoyed the video, if you want to catch them live, I generally stream each Tuesday at 2pm UK time on the IBM Developer Twitch channel
Top comments (0)