Terra

Posted on Jul 14, 2024 • Originally published at pourterra.com

Mathematics for Machine Learning - Day 7

#learning #machinelearning #beginners #tutorial

A weekly review

Today I'm just going to summarize and translate some into code regarding the properties and regarding the particular solution so there isn't anything new today nor will there be much mathematical notation, aside from the last section.

A fun fact about today is, remember the first ever equation regarding the particular solution? I used gradient descent and found another different way :D so fun.

Matrices

Matrix comparison function

The reason I'm using this function instead of a built-in function is for you (the readers) to know how the comparison is made (because honestly, I don't know how np.array_equal works).

import numpy as np

def compare_two_matrices(matrixA:np.ndarray, matrixB:np.ndarray)->np.ndarray:
    # Ensuring matrix A and B is the same
    if matrixA.shape != matrixB.shape:
        return "Matrix A and B should have the same shape"

    # Comparing each index of the matrices and returning a list
    result = [i for j in matrixA==matrixB for i in j]

    #Return text if all inside the list is true
    if all(result):
        return "Both matrices are exactly the same"

    #Return text if not all inside the list is true
    return "Matrices are not the same"

Addition and subtraction

m = 5
n = 3

Amn = np.random.randint(low=0,high=100,size=(m,n))
Bmn = np.random.randint(low=0,high=100,size=(m,n))
Cmn = Amn+Bmn
Dmn = Amn-Bmn

print(Amn.shape,Bmn.shape, Cmn.shape, Dmn.shape)
# (5, 3) (5, 3) (5, 3) (5, 3)

Multiplication

m = 5
n = 3
k = 9

Amn = np.random.randint(low=0,high=100,size=(m,n))
Bnk = np.random.randint(low=0,high=100,size=(n,k))
Cmk = np.dot(Amn,Bnk)

print(Amn.shape,Bnk.shape, Cmk.shape)
# (5, 3) (3, 9) (5, 9)

Associativity

m = 9
n = 3
k = 5
l = 7

Amn = np.random.randint(low=0,high=100,size=(m,n))
Bnk = np.random.randint(low=0,high=100,size=(n,k))
Ckl = np.random.randint(low=0,high=100,size=(k,l))

Left_section = np.dot(np.dot(Amn,Bnk), Ckl)
Right_section = np.dot(Amn, np.dot(Bnk,Ckl))

compare_two_matrices(Left_section, Right_section)

# 'Both matrices are exactly the same'

Distributivity

Test 1

m = 9
n = 3
k = 5

Amn = np.random.randint(low=0,high=100,size=(m,n))
Bmn = np.random.randint(low=0,high=100,size=(m,n))
Cnk = np.random.randint(low=0,high=100,size=(n,k))

Left_section = np.dot((Amn+Bmn),Cnk)
Right_section = np.dot(Amn, Cnk) + np.dot(Bmn, Cnk)

compare_two_matrices(Left_section, Right_section)

# 'Both matrices are exactly the same'

Test 2

m = 9
n = 3
k = 5

Amn = np.random.randint(low=0,high=100,size=(m,n))
Bnk = np.random.randint(low=0,high=100,size=(n,k))
Cnk = np.random.randint(low=0,high=100,size=(n,k))

Left_section =  np.dot(Amn, (Bnk + Cnk))
Right_section = np.dot(Amn, Bnk) + np.dot(Amn, Cnk)

compare_two_matrices(Left_section, Right_section)

# 'Both matrices are exactly the same'

Inverse

identity_matrix = np.identity(2)
wrong_matrix = np.array([[4,8],[1,2]])
right_matrix = np.array([[4,8],[0.5,2]])

I hope you remember why the wrong matrix won't work when I try to inverse the matrix while the right matrix works just fine even when it's just a one value difference!

Inverse Function

def create_inverse(matrix:np.ndarray)->np.ndarray:
    # Confirming square matrix
    if matrix.shape[0] != matrix.shape[1]:
        return "Matrix needs to be square to be inversed."

    # Confirming shape
    if matrix.shape != (2,2):
        return "I'm not smart enough to code more complex matrices and don't ask for an inverse of 1x1."

    # Creating the adjoint matrix
    adj_matrix = [[matrix[-1][-1],-matrix[0][-1]],\
            [-matrix[-1][0],matrix[0][0]]]

    # Creating the determinant of the matrix
    det_matrix = np.dot(matrix[0][0],matrix[-1][-1])-np.dot(matrix[0][-1],matrix[-1][0])

    #Calculating the inverse of the matrix
    inverse_matrix = adj_matrix/det_matrix

    return inverse_matrix


create_inverse(wrong_matrix)

# RuntimeWarning: divide by zero encountered in divide (inverse_matrix = adj_matrix/det_matrix)

Damn... my function spoiled the fun. So always remember! not all matrices can be inversed, aside from the rule that it must be a square matrix, it also needs to have a non-zero determinant.

create_inverse(right_matrix)

"""
array([[ 0.5  , -2.   ],
       [-0.125,  1.   ]])
"""

multiply_with_inverse = np.dot(create_inverse(right_matrix), right_matrix)

compare_two_matrices(multiply_with_inverse, identity_matrix)

# 'Both matrices are exactly the same'

This also proves the formula that of a matrix is multiplied by the inverse of said matrix, the result is an identity matrix!

Particular Solution

This is where it gets fun. So let me ask you reader, if you've read the previous days, you know that aside from having a sort of identity matrix inside the matrix, the formula to find the particular solution that I used is more like guessing or iterating values to find the answer.

So what?

That means, when translating it into code, I also need to make it iterative and change the value of x until it matches the value we know as the result.

So what?

That means I need to skip a few chapters. (Gradient Descent)

\theta := \theta - \alpha \left( \frac{2}{n} X^T (X\theta - y) \right)

A name so famous and so badass that I can't help to learn it quicker. This is what we'll use in determining x. Now bear in mind, today is more coding than mathematics, so I won't go to too much detail regarding why do they use theta or why is a nabla there.

Today, I'll just explain that what we did is just gradient descent but in our brain, so this is how I'm translating it.

Mean Squared Error (MSE)

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

I'm going to be using Mean Square Error (MSE), because since there'll be some negative values, I need to ensure what's being calculated is the difference in value not accounting if it's negative or positive.

def MSE_function(predicted:np.ndarray, expected:np.ndarray)->np.ndarray: # Mean squared error
    if len(predicted) != len(expected):
        return "Predicted output and expected output should be the same"

    #The amount of values in prediction and expected values
    n = len(predicted)

    # Calculating the difference squared of the expected and predicted value
    total_square = np.array([(predicted[i]-expected[i]) for i in range(n)])

    # The mean squared of the difference
    mse_value = total_square/n

    return mse_value

Gradient Descent

\theta := \theta - \alpha \left( \frac{2}{n} X^T (X\theta - y) \right)

def finding_gradient(input_matrix:np.ndarray, mse_error:np.ndarray)->np.ndarray:

    #Multiplying matrix A (transposed) by the MSE vector
    At_E = np.dot(input_matrix.T, mse_error)

    # Dividing it by m and multiplying it by 2
    gradient = (2/len(mse_error))*At_E

    return gradient

def update_x(current_x:np.ndarray, gradient:np.ndarray, descent:int = 0.1)->np.ndarray:
"""
I'm using assert because I'd rather have an error on this section rather than outputing a string that'll be an error
somewhere else down the line :D
"""
    assert len(current_x)==len(gradient)

    # The amount of value
    n = len(current_x)

    # Multiplying the gradient by the learning rate
    gradient = gradient*descent

    # Iterating against x and subtracting it by alpha*gradient
    update_values = [(current_x[i]-gradient[i]) for i in range(n)]

    return update_values

Yes, it's split into two functions. I try my best to ensure each function play only one specific role (from what I know this is best practice in coding, S in SOLID principle).

So, you can see from my code and the function that there's similarities. Here's what my code mean in the mathematical notation.

At E = X^T \\ gradient = \frac{2}{n} X^T

\text{current x} = X\theta \\ descent = \alpha

\text{updated values} = \theta - \alpha \left( \frac{2}{n} X^T (X\theta - y) \right)

P.S. I can't use underscore (_) inside of katex text, so it should've been At_E, current_x and updated_values which refer to my variables and not just some random name.

P.P.S. I'll never change from snake case so katex can fight me. I'm a python developer, the snake god might hate me if I don't use snake case and if you don't know what I'm talking about... It's been a long day, I'm sorry for rambling.

Full code

And that's it! we can use it to calculate the system equation from the previous days.

A = np.array([[1, 0, 8, 0, -4], [0, 1, 2, 0, -12], [0, 0, 4, 1, 7]])
B = np.array([42, 8, 12])
x = np.array([0, 0, 0, 0, 0])
descent_value = 0.041

for i in range(10000):
    prediction = np.dot(A,x)

    mse_value = MSE_function(prediction, B)

    gradient = finding_gradient(A, mse_value)

    x = update_x(x, gradient, descent_value)

    # Generating final report
    if np.abs(sum(mse_value))<0.0001:
        print("Generation finish after {} iteration".format(i))
        print("A total mean square error value of {} or an average of {}\n".format(round(sum(mse_value),5),np.mean(mse_value)))
        print("With x:",x)
        print("With Ax:",np.dot(A,x))
        break
"""
Generation finish after 1077 iteration
A total mean square error value of 0.0001 or an average of 3.303279802955059e-05

With x: [4.024892733606064, -4.136195147684619, 4.626743232157506, -4.825002085330449, -0.24024375952185326]
With Ax: [41.99981363  8.00021643 12.00026453]
"""

And that's it! You've made a basic model that learns from previous data making it a machine learning model!

Acknowledgement

I can't overstate this: I'm truly grateful for this book being open-sourced for everyone. Many people will be able to learn and understand machine learning on a fundamental level. Whether changing careers, demystifying AI, or just learning in general, this book offers immense value even for fledgling composer such as myself. So, Marc Peter Deisenroth, A. Aldo Faisal, and Cheng Soon Ong, thank you for this book.

Source:
Deisenroth, M. P., Faisal, A. A., & Ong, C. S. (2020). Mathematics for Machine Learning. Cambridge: Cambridge University Press.
https://mml-book.com

DEV Community

Mathematics for Machine Learning - Day 7

A weekly review

Matrices

Matrix comparison function

Addition and subtraction

Multiplication

Associativity

Distributivity

Test 1

Test 2

Inverse

Inverse Function

Particular Solution

So what?

So what?

That means I need to skip a few chapters. (Gradient Descent)

Mean Squared Error (MSE)

Gradient Descent

Full code

Acknowledgement

Top comments (0)