I've been doing machine learning stuff for a few years now and find it pretty freaking cool. So I want to write a short post for people who have zero background in machine learning to see if I can get a few more people interested, little prior maths knowledge will be required here, I'll try my best to explain it with some python code to try out.
Machine learning allows us to write programs to do clever things by using data rather than explicit instructions.
If I was to tell you "I just saw a 10ft person" would you believe me? You probably wouldn't. This is because over our lifetimes we have been exposed to a lot of data and we have learned. But what have we learned? Well, we know intuitively what an average person's height is (roughly at least), and how much we can expect a person's height to deviate from that average. With this intuition, we can already start building a machine learning program.
An intuition like this can be modeled with what is known as a Gaussian distribution. A Gaussian distribution is described by just two parameters, the mean (that's our average height) and the standard deviation (how much we expect it to deviate). Let's get into some numbers... Firstly let's have a look at the formula for this Gaussian Distribution:
Eeek... that looks a bit scary, let's break it down a bit. Firstly, the μ is our mean value, let's say we think a person is around 150cm, and σ is our standard deviation, let's say we expect the height to vary by 30cm. If we plug those two numbers into the formula we get this new equation:
This is still a bit scary but it's getting better... Now that the equation only has one variable, let's plot this and have a look at the shape of the function. Below we can see what is often called a bell curve, let's see if we can make some sense of it.
The x-axis represents the height of a random person and the y-axis is the probability that the data comes from this distribution. Firstly, let's look at the peak of the distribution, the probability is highest when the height is 150, well that's our mean value so it makes sense for it to be the most likely height. Can you see what σ represents? If you're unsure, copy the code below and change the value of the "std" variable. Remember my initial statement, "I just saw a 10ft person". Let's see what the model thinks of that statement... P(X=300)=0.00000005, woah, that's very small! Our model agrees it's not very likely.
# Very popular maths library import numpy as np # Library for making plots in python import matplotlib.pyplot as plt def gaussian(x, mean, std): """ x: Our "random variable" mean: The expected value of the distribution std: How much we expect the value to deviate from the expected """ return (1/(std*np.sqrt(2*np.pi))) * np.exp(-(x - mean)**2 / (2*std**2)) # Create an array of points from 0 to 299 x_values = np.arange(0, 300) # Call the gaussian function for each of the x_values probabilities = np.apply_along_axis(lambda x: gaussian(x, 150, 30), 0, x_values) plt.plot(x_values, probabilities) plt.title('Probability density function') plt.xlabel('X') plt.ylabel('Probability') plt.show()
Okay, so we have this model, now what? Well, we can now make a decision based on data alone. Let's say we are given an unknown height and we want to classify whether the person is an adult or a child. This can be done by creating two models and then choosing the model which is the most likely.
Firstly, we will fit a model for a child. This will require gathering data of heights of children, calculating the mean and standard deviation and plugging it into our Gaussian distribution formula. We will then do the same for our adults, we will gather data and calculate the statistics. Let's see what the distributions look like with some made-up data. Looking at these distributions we can give an unknown height a class (adult or child). For example, if we have the height of 160, the model for an adult will return the higher probability, therefore the class is an adult. If we have the height 110 the class will be a child.
children = [ 100, 110, 103, 110, 144 ] adults = [180, 190, 200, 150, 170] child_mean = np.mean(children) child_std = np.std(children) adult_mean = np.mean(adults) adult_std = np.std(adults) # Create an array of points from 0 to 299 e.g., x_values = np.arange(0, 300) # Call the gaussian function for each of the x_values adult_probabilities = np.apply_along_axis( lambda x: gaussian(x, adult_mean, adult_std), 0, x_values) child_probabilities = np.apply_along_axis( lambda x: gaussian(x, child_mean, child_std), 0, x_values) plt.plot(x_values, adult_probabilities, label='Adult') plt.plot(x_values, child_probabilities, label='Child') plt.legend() plt.title('Probability density function') plt.xlabel('X') plt.ylabel('Probability') plt.show()
You might be thinking "Well this is pretty cool, but I could just do this with some 'if statements'". Yeah, you could, but let's say you now want your model to work for people in a different country where the heights of people are different. With a machine learning model, we simply use different data to train the model and not change a single line of code. For a simple task like this it probably doesn't really seem that cool, but imagine creating a machine learning program to recognise pictures of cats. A new model to recognise pictures of dogs can be learned simply by changing the data!
Congratulations, if you've followed along you have just made your first machine learning program! This is only the start, there's much more to learn, such as, how can we use more than one variable, e.g., height and weight. What about if our data doesn't nicely fit a Gaussian Distribution and how do we evaluate that our model is any good? And what about tasks that are not classification, like predicting the stock markets.
Thank you for reading!