Philipp Gysel

Posted on

# Introduction to Machine Learning

In this blog post, you will learn the basis principles of machine learning. I won’t focus on specific ML model types or complicated math. Instead, this blog is for people new to machine learning and covers high level information on supervised learning, training, inference, and a specific application of machine learning.

# Why Machine Learning

If you’re reading tech articles, chances are you heard the words “Machine Learning” a lot lately. Having used Machine Learning myself for my master’s thesis and professionally, I decided to make this post and share some high-level knowledge I gained during this time. Note that this article will give you an introduction to the topic, introduce the most important concepts, but if you are looking for in-depth information, you should look at a more advanced post.

Anyways, so what stands behind the two words “Machine Learning”? Well – it’s exactly what the words suggest: a machine that learns. This is a very powerful concept! A human defines a model that solves a problem, but that model contains many unknown “knobs” that will be tuned by the machine. For that purpose, the machine is given many examples from which it can learn to master the task at hand.

# The concept behind Machine Learning

Great, so ML can solve problems in many areas, but how does ML work at a high level? For the sake of discussion, let’s just focus on the most prominent area of ML, which is supervised learning. According to Wikipedia,

“Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs”.

While this may sound a bit abstract and hard to grasp at first, things become much clearer when we look at a concrete example. Let’s assume your goal is to automate the classification of handwritten digits. Let’s also assume you have a lot of examples from which your machine learning model can learn:

(Credit: Wikipedia)

Now you have exactly the situation described in the previous Wikipedia citation: You have input-output pairs and need a function (your ML model) which maps the two. Your input consists of the images with handwritten digits. Your output is the “ground truth”: the actual digit.

In order to get this function using ML, you would first choose a model. ML offers many different types of models, let’s assume you choose neural networks. Great, so now you need to choose the hyper parameters of your neural network, like the number of parameters (generally, the more complex the task, the more parameters your model should have). So now the rest of the work will be done by the machine: the machine itself will learn how to map the input to the correct output using the training data you feed it.

Note that your network – if architected correctly – will not memorize the training data, but instead it will learn how to generally distinguish between different digits.

This data set with handwritten digits is actually a very prominent ML challenge called MNIST. If you’re interested to solve this ML problem yourself, you can easily do so using TensorFlow. For more information on how to install TensorFlow, checkout the first blog post of this series. The TensorFlow team has created a great tutorial that walks you through all steps for setting up your training data, creating your neural network, training it and then testing the network on unseen data.

# Training vs. Inference

As you can see, there are two major phases: first you train your model, second you use it to recognize images. These two phases are called training and inference. In the training phase, you first initialize the parameters randomly, then you repeatedly pass the data through the model, compare the output to the ground truth, and slowly adjust the parameters to better match the desired output. You continue this so called stochastic gradient descent until your model doesn’t improve anymore, then you freeze the parameters.

Inference on the other hand refers to the prediction of unseen data, using your trained model. If you’re using your model on an edge device like a smart phone, you might also compress your model, so it takes up less resources.

Hint: for more details on training for neural networks, watch out for my future blog post…

# The data: training vs. validation vs. test

When training an ML model, you typically want to know how accurate your model is in the end. But you don’t want to test your model on data you used during training, and rather use unseen data. For this purpose, you should use two separate data sets: the training data is used during training and the test data is used at the very end to score your model. Additionally, you should monitor the progress of your model during training, again using unseen data. You therefore use a third data set called validation data which you apply to the model regularly. Once the score on the validation data isn’t improving any more, you know the training process has finished.

Alternatively, some people use the validation data to compare different model configurations (should I use more parameters in my model or fewer? Should I use a neural network, an SVM or is a simple regression enough?).