Gradient Descent: The Mother of All Algorithms?

#machinelearning #datascience #python #programming

Introduction to Gradient Descent

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction to find the minima of the function. it' the backbone of a machine-learning algorithm.Gradient descent is originally proposed by Cauchy in 1847. It is an important and widely used algorithm in machine learning.

When we have two or more derivative of the same function, they are called Gradient

Loss/Error Function

In simple linear regression when we have a single input. and we have to obtain a line that best fits the data. The best fit line is the one for which total prediction error (all data points) are as small as possible. Error is the distance between the point to the regression line.

E.g: Relationship between hours of study and marks obtained. The goal is to find the relationship between the hours of study and marks obtained by the student. for that, we have to find a linear equation between these two variables.

y_(predict) = b_0 + b_1.x

Error is the difference between predicted and actual value. to reduce the error and find the best fit line we have to find value for bo and b1.

lossfunction = (y_p-y_a)^2

for finding a best fit line value of bo and b1 must be that minimize the error.error is the difference between predicted and actual output.

Now that we have a derivative, gradient descent will use it to find where the sum of squared is lowest. The process of finding the optimal value for m(coefficient) and b(intercept) to reduce the error/lost function is called Gradient Descent. Gradient descent finds the minimum value by taking steps from an initial guess until it reaches the best value.

Gradient Descent Implementation in Python


from sklearn.datasets import make_regression
X, y = make_regression(n_samples=100, n_features=1, noise=0.1)

def LinearRegression(X, y, m_current=0, b_current=0, epochs=2000, learning_rate = 0.001):
    N = float(len(y))

    for i in range(epochs):
        y_current = m_current * X + b_current
        m_gradient = (-2/N) * sum(y-y_current) * X 
        b_gradient = (-2/N) * sum(y-y_current)
        m_current = m_current - (learning_rate * m_gradient)
        b_current = b_current - (learning_rate * b_gradient)  
    return m_current, b_current

m, b = LinearRegression(X, y) 
print(m , b)