In the Elements of Statistical Learning book, the chapter on Supervised Learning describes equation for a linear model as a vector (Eq 2.1) and matrix form (Eq 2.2). Both are equivalent.
As a vector is another word for a one dimensional array.
We can write this with parentheses going horizontal like
but we can also write this in a more matrix form with square brackets in a vertical form
Now we reach Equation 2.1 below.
This equation is a fancy way to write out an equation you might see for linear regression like this
Now going back to our Equation 2.1, what do these variables mean? Here's the equation again.
First,
is the dependent variable value, or output vector. This will contain the values for what you want to predict. The caret symbol ^
here is called a "hat". This a hypothetical value we find from our model prediction. So we can read
as "Y hat".
The next variable we run into is . This is "the intercept, also known as the bias in machine learning."
On an X-Y plot you have seen in grade school, this is where the line crosses the vertical y-axis. Another way to think about this intercept is the baseline value for your dependent value when your independent variable predictors are all zero.
Now, let's gears to talking about some notation. There is a large "E" looking symbol with some numbers and letters, . We call the "E" symbol "sigma" and is a fancy way to say "add these things up".
What things are we adding up? And how? The is actually a representation of matrices of being a row in the matrix below and as a single value in the column vector.
Similarly, the is a column vector:
Multiplying everything together will produce a single column vector, .
We all do this so that we can run equations like
this over and over again across multiple sets of values, which are encoded as rows in the square matrices above .
Top comments (0)