In this blog post, you will learn the basis principles of machine learning. I won’t focus on specific ML model types or complicated math. Instead, this blog is for people new to machine learning and covers high level information on supervised learning, training, inference, and a specific application of machine learning.
If you’re reading tech articles, chances are you heard the words “Machine Learning” a lot lately. Having used Machine Learning myself for my master’s thesis and professionally, I decided to make this post and share some high-level knowledge I gained during this time. Note that this article will give you an introduction to the topic, introduce the most important concepts, but if you are looking for in-depth information, you should look at a more advanced post.
Anyways, so what stands behind the two words “Machine Learning”? Well – it’s exactly what the words suggest: a machine that learns. This is a very powerful concept! A human defines a model that solves a problem, but that model contains many unknown “knobs” that will be tuned by the machine. For that purpose, the machine is given many examples from which it can learn to master the task at hand.
Several years ago, machine learning was mostly an academic thing, but things have changed significantly. Companies like Google use the power of Machine Learning (ML) in many of their products: The speech recognition of Google Maps for example. “Ok Google, navigate to next Starbucks” … and you’re already on your way to a delicious cappuccino☕. In fact, Google announced one year ago that their research efforts have made it possible to run English speech recognition on Pixel smartphones, thus the recognition even works offline. Another way Google uses ML is for the Android photo gallery: you can search your pictures by key words like “beach” or “mountain” 🏖️⛰️. Moreover, Google uses ML in places customers can’t even see, namely behind the scenes, for its data centers: using neural networks, the company has managed to reduce its power consumption for cooling by 40%.
Great, so ML can solve problems in many areas, but how does ML work at a high level? For the sake of discussion, let’s just focus on the most prominent area of ML, which is supervised learning. According to Wikipedia,
“Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs”.
While this may sound a bit abstract and hard to grasp at first, things become much clearer when we look at a concrete example. Let’s assume your goal is to automate the classification of handwritten digits. Let’s also assume you have a lot of examples from which your machine learning model can learn:
Now you have exactly the situation described in the previous Wikipedia citation: You have input-output pairs and need a function (your ML model) which maps the two. Your input consists of the images with handwritten digits. Your output is the “ground truth”: the actual digit.
In order to get this function using ML, you would first choose a model. ML offers many different types of models, let’s assume you choose neural networks. Great, so now you need to choose the hyper parameters of your neural network, like the number of parameters (generally, the more complex the task, the more parameters your model should have). So now the rest of the work will be done by the machine: the machine itself will learn how to map the input to the correct output using the training data you feed it.
Note that your network – if architected correctly – will not memorize the training data, but instead it will learn how to generally distinguish between different digits.
This data set with handwritten digits is actually a very prominent ML challenge called MNIST. If you’re interested to solve this ML problem yourself, you can easily do so using TensorFlow. For more information on how to install TensorFlow, checkout the first blog post of this series. The TensorFlow team has created a great tutorial that walks you through all steps for setting up your training data, creating your neural network, training it and then testing the network on unseen data.
So far, we’ve covered the concept of training your model, so it can accurately represent this mapping function from your training data to the actual ground truth. Remember though: the ultimate goal you have in mind is to leverage the intelligence of your model to automate a challenging task. As a case in point, you might have your own shipment startup, where people can drop off their packages at your store and your drivers deliver to the correct address. Say you get a lot of packages and want to minimize the route for each driver, so you need some smart algorithm picking the optimal route. But first you need to know all the addresses, and that’s something you can automate: You can scan the destination addresses, send the picture to your neural network, which will tell you the zip code (and thus the approximate address) this particular package has to go to. Sure, this wouldn’t take an infinite amount of time to do manually, but when you have hundreds of packages, this will avoid you getting a headache😉
As you can see, there are two major phases: first you train your model, second you use it to recognize images. These two phases are called training and inference. In the training phase, you first initialize the parameters randomly, then you repeatedly pass the data through the model, compare the output to the ground truth, and slowly adjust the parameters to better match the desired output. You continue this so called stochastic gradient descent until your model doesn’t improve anymore, then you freeze the parameters.
Inference on the other hand refers to the prediction of unseen data, using your trained model. If you’re using your model on an edge device like a smart phone, you might also compress your model, so it takes up less resources.
Hint: for more details on training for neural networks, watch out for my future blog post…
When training an ML model, you typically want to know how accurate your model is in the end. But you don’t want to test your model on data you used during training, and rather use unseen data. For this purpose, you should use two separate data sets: the training data is used during training and the test data is used at the very end to score your model. Additionally, you should monitor the progress of your model during training, again using unseen data. You therefore use a third data set called validation data which you apply to the model regularly. Once the score on the validation data isn’t improving any more, you know the training process has finished.
Alternatively, some people use the validation data to compare different model configurations (should I use more parameters in my model or fewer? Should I use a neural network, an SVM or is a simple regression enough?).
I hope this post helped you master the principles of machine learning. Please remember to heart❤️ this article if it was helpful to you and leave a comment if you have any questions!