Most of learning materials on machine learning are overly complex and use too much math from start in my opinion. In this post, I will try to simply explain what simple linear regression is. This will require few mathematical terms, but I will keep it as just one simple formula. Mathematical details behind it, I will explain in some other post. So, lets first start with definition.
Definition: Linear regression is model in which we assume linear relationship between input variables and output.
Unless you at least took some college level linear algebra, that definition might be confusing. So, let’s try to simplify it. And lets use simple linear regression for that.
In simple linear regression we are predicting value based on one input, for example, price of house based on size. Because it is linear regression, we are expecting to get following equation which we could use for our prediction:
In this equation y would be price we are trying to predict, x would be size of house we are using as input value, and A and B are coefficients which define our model. There is mathematical process to define values for those coefficients and there are libraries that do that for us, but once again, that is not goal of this post, and will be explained in next one.
Now, for the reason why it is called linear regression. If we would generate chart visualizing this function, we would get straight line like in the chart below.
In the chart above, orange points are actual values, while blue ones are predicted ones. As we can see, some are better fitting than others, but that is fine. We are trying to get model that would fit our data best, but not perfect. If we would try to fit every point, function would be much more complex than this, not a straight line, and it would perform poorly on new and unseen data.
Question is what line would describe it best. There actually is definition for that. For every value of training data that we used to generate model, we sum differences of predicted and actual values. Function where this sum is smallest, is one that gives best model.
Hopefully, this gives simple view of what simple linear regression is. It is simple model that can be used wherever data could be approximated with a line. In next post I will explain how to use python and different libraries to make predictions and mathematical background behind generating models.