DEV Community

vindianadoan
vindianadoan

Posted on

Supervised learning - Linear regression model

Linear Regression Model

Concept map:
Linear regression model \rightarrow SSE \rightarrow Gradient descent \rightarrow Training the model

1. Linear Regression Model:
Linear regression is a type of supervised learning algorithm used to predict a continuous target variable based on one or more input features. Other common supervised learning algorithms include logistic regression, decision trees, and neural networks.
The goal of linear regression is to find the best linear relationship between the input features and the target variable. The model takes the form of a linear equation:

y=b0+b1x1+b2x2+...+bnxny = b_0 + b_1 x_1 + b_2 x_2 + ... + b_n x_n

where:
yy is the target variable x1,x2,...,xnx_1, x_2, ..., x_n are the input features
b0b_0 is the y-intercept (also known as the bias)
b1,b2,...,bnb_1, b_2, ..., b_n are the coefficients (also known as weights) of the input features

The objective of linear regression is to find the values of b0,b1,b2,...,bnb_0, b_1, b_2, ..., b_n that minimize the difference between the predicted and actual values of yy . This is typically done by minimizing the sum of squared errors (SSE) between the predicted and actual values.

2. SSE
By definition:

SSE=i=1n(yiyi^)2SSE = \sum_{i=1}^n (y_i - \hat{y_i})^2

where:
yiy_i is the actual value of the target variable for the ithi^{th} data point
yi^\hat{y_i} is the predicted value of the target variable for the ithi^{th} data point
The minimization of SSE is typically achieved using gradient descent, which is an optimization algorithm that iteratively adjusts the values of b0,b1,b2,...,bnb_0, b_1, b_2, ..., b_n to find the optimal values that minimize SSE.


3. Gradient Descent:
Gradient descent is an iterative optimization algorithm that is used to find the optimal values of the model parameters (weights and biases) that minimize the cost function (SSE in this case). It works by updating the parameters in the opposite direction of the gradient of the cost function with respect to the parameters.

The update rule for the weights in gradient descent is:

wi=wiαCostwiw_i = w_i - \alpha \frac{\partial Cost}{\partial w_i}

where:
wiw_i is the ithi^{th} weight
α\alpha is the learning rate or step size for each iteration of gradient descent
Costwi\frac{\partial Cost}{\partial w_i} is the partial derivative of the cost function with respect to the ithi^{th} weight
The update rule for the bias term is similar:
b0=b0αCostb0b_0 = b_0 - \alpha \frac{\partial Cost}{\partial b_0}

where:
b0b_0 is the bias term
Costb0\frac{\partial Cost}{\partial b_0} is the partial derivative of the cost function with respect to the bias term

The partial derivatives of the cost function with respect to the weights and bias can be calculated using calculus.

4. Training the Model:
To train the model, we first initialize the weights and bias to some random values. We then iteratively update the weights and bias using the gradient descent algorithm until the cost function reaches a minimum or a predefined stopping criterion is met (e.g., maximum number of iterations reached).

Once the model is trained, we can use it to make predictions on new data by plugging in the values of the input features into the linear equation and calculating the predicted value of the target variable.

Top comments (0)