DEV Community

loading...
Cover image for How neural network works? Let's figure it out

How neural network works? Let's figure it out

Petro Liashchynskyi
Software Engineer / ML Engineer / AI Researcher / Web Developer.
ใƒป6 min read

Hey, what's up ๐Ÿ˜ In my previous article i have described how to build neural network from scratch with only JavaScript. Today, at the request of several people, i'll try to explain mathematical principle of neural networks. Bro, you finally will understand what under the hood of that monster is!

And first, i'm gonna tell you another secret: there's no magic, just only math ๐Ÿ˜ต

This article is based on my previous one. If you don't read it yet, it's time to do that! I will use the same formulas and try to explain them. Let's go!

Preparation

I'm gonna solve XOR again ๐Ÿ˜… It's not a joke, bro! There are many data science books start with solving it ๐Ÿ˜Ž One more time i remind you XOR input table.

Inputs Outputs
0 0 0
0 1 1
1 0 1
1 1 0

To demonstrate it let's use the following structure of neural network.

nn structure

Here we have 2 neurons in input layer, 4 in hidden and 1 in output layer.

Weights initialization

The main goal of neural network training is adjusting the weights to minimize the output error. In most cases, the weights is initializing randomly and during neural net training these ones is adjusting by backpropagation algorithm.

So, let's initialize the weights randomly from [0, 1] range.

weights

Graphically, it looks like this.

weights-init

Forward propagation

Ok, let's compute neuron inputs. I will use only one input case to save time: 0 and 1 so the output will be 1.

The formula:
net

So, for the first neuron in the hidden layer:

net1_h = 0 * 0.2 + 1 * 0.6 = 0.6

/**

1..n, n = 2 (2 neurons in the input layer)

0 value of the first input element
1 value of the second input element

0.2 the weight from first input neuron to first hidden
0.6 the weight from second input neuron to first hidden

Understand, bro? ๐Ÿ˜ 

*/

Enter fullscreen mode Exit fullscreen mode

For second one and others:

net2_h = 0 * 0.5 + 1 * 0.7 = 0.7
net3_h = 0 * 0.4 + 1 * 0.9 = 0.9
net4_h = 0 * 0.8 + 1 * 0.3 = 0.3

Enter fullscreen mode Exit fullscreen mode

Now, we need one more thing - we need to choose activation function. I'll use sigmoid.
sigmoid

The formula and derivative:
sigm
deriv

f(x) = 1 / (1 + exp(-x))
deriv(x) = f(x) * (1 - f(x)) 

Enter fullscreen mode Exit fullscreen mode

So, now we apply our activation to each of computed net:

output1_h = f(net1_h) = f(0.6) = 0.64
output2_h = f(net2_h) = f(0.7) = 0.66
output3_h = f(net3_h) = f(0.9) = 0.71
output4_h = f(net4_h) = f(0.3) = 0.57

Enter fullscreen mode Exit fullscreen mode

We've got the output values for each neuron in the hidden layer. Graphically, it looks like this:

w-hidden

And now, when we've got output values for hidden layer neurons we can calculate the output value for the output layer.

net_o = 0.64 * 0.6 + 0.66 * 0.7 + 0.71 * 0.3 + 0.57 * 0.4 = 1.28
output_o = f(net_o) = f(1.28) = 0.78

Enter fullscreen mode Exit fullscreen mode

And here we go.

out

Back propagation

Bro, look at the output value. What do you see? 0.78 right? If you remember the XOR table you know that we should have got 1 for this case 0 1, but we've got 0.78. That's called an error. Let's calculate that.

Output error and delta

The formula:

error

target = 1
error = target - output_o = 1 - 0.78 = 0.22

Enter fullscreen mode Exit fullscreen mode

Now, we need to calculate the delta error. In general, that's the value by which you adjust the weights.

The formula:

delta

You can use this site for sigmoid derivative calculation.

delta_error = deriv(output_o) * error = deriv(0.78) * 0.22 = 0.21 * 0.22 = 0.04

Enter fullscreen mode Exit fullscreen mode

Hidden error and delta

Let's do the same for each neuron in the hidden layer. The formula is different a little bit.

error-hidden

We need to calculate the error for each neuron. Remember it, bro. Let's get started!

error1_h = delta_error * 0.6 = 0.04 * 0.6 = 0.024
error2_h = delta_error * 0.6 = 0.04 * 0.7 = 0.028
error3_h = delta_error * 0.6 = 0.04 * 0.3 = 0.012
error4_h = delta_error * 0.6 = 0.04 * 0.4 = 0.016

Enter fullscreen mode Exit fullscreen mode

And again the delta!

delta

delta_error1_h = deriv(output1_h) * error1_h = deriv(0.64) * 0.024 = 0.22 * 0.024 = 0.005
delta_error2_h = deriv(output2_h) * error2_h = deriv(0.66) * 0.028 = 0.224 * 0.028 = 0.006
delta_error3_h = deriv(output3_h) * error3_h = deriv(0.71) * 0.012 = 0.220 * 0.012 = 0.002
delta_error4_h = deriv(output4_h) * error4_h = deriv(0.57) * 0.016 = 0.23 * 0.016 = 0.003

Enter fullscreen mode Exit fullscreen mode

The time has come! ๐Ÿ˜Ž

Now, we have all variables to update the weights. The formulas look like this.

wetights

Let's start from the hidden to the output.

learning_rate = 0.001

hidden_to_output_1 = old_weight + output1_h * delta_error * learning_rate = 0.6 + 0.64 * 0.04 * 0.001 = 0.6000256
hidden_to_output_2 = old_weight + output2_h * delta_error * learning_rate = 0.7 + 0.66 * 0.04 * 0.001 = 0.7000264
hidden_to_output_3 = old_weight + output3_h * delta_error * learning_rate = 0.3 + 0.71 * 0.04 * 0.001 = 0.3000284
hidden_to_output_4 = old_weight + output4_h * delta_error * learning_rate = 0.4 + 0.57 * 0.04 * 0.001 = 0.4000228

Enter fullscreen mode Exit fullscreen mode

We've got the values too close to the old weights. It's because we chose the learning rate too small. It's a very important hyper parameter. When you choose it too small - your network will training for years ๐Ÿ˜„ Otherwise, when it's a large number - your network will train faster, but it's accuracy may be low for new data. So you have to choose it correctly. The optimal value is in range between 1e-3 and 2e-5.

Ok, let's do the same for the input to the hidden synapses.

//for the first hidden neuron
input_to_hidden_1 = old_weight + input_0 * delta_error1_h * learning_rate = 0.2 + 0 * 0.005 * 0.001 = 0.2
input_to_hidden_2 = old_weight + input_1 * delta_error1_h * learning_rate = 0.6 + 1 * 0.005 * 0.001 = 0.600005

//for the second one
input_to_hidden_3 = old_weight + input_0 * delta_error2_h * learning_rate = 0.5 + 0 * 0.006 * 0.001 = 0.5
input_to_hidden_4 = old_weight + input_1 * delta_error2_h * learning_rate = 0.7 + 1 * 0.006 * 0.001 = 0.700006

//for the third one
input_to_hidden_5 = old_weight + input_0 * delta_error3_h * learning_rate = 0.4 + 0 * 0.002 * 0.001 = 0.4
input_to_hidden_6 = old_weight + input_1 * delta_error3_h * learning_rate = 0.9 + 1 * 0.002 * 0.001 = 0.900002

//for the fourth one
input_to_hidden_7 = old_weight + input_0 * delta_error4_h * learning_rate = 0.8 + 0 * 0.003 * 0.001 = 0.8
input_to_hidden_8 = old_weight + input_1 * delta_error4_h * learning_rate = 0.3 + 1 * 0.003 * 0.001 = 0.300003

Enter fullscreen mode Exit fullscreen mode

That's it! Finally ๐Ÿ˜‰

Conclusions

Oh, finally we did all the math stuff! But we only did that for one training set - 0 and 1. For our problem we solve (XOR) we have 4 training sets (see the table above). That means you have to do the same calculations we just did above for each training set! Brrr, that's terrible ๐Ÿ˜‘ Too much math ๐Ÿ˜†

So, in machine learning when you do one forward propagation step (from the input layer to the output) and one backward (from the output layer to the input) for one training set it's called an iteration. Another important term is epoch. Epoch counter is iterating when you pass through your neural network all the training sets. In our case, we have 4 training sets. One iteration means one training set passed through neural network. When all training sets passed through a network - here we have one epoch. Then: 4 iterations equals 1 epoch. Understand, bro? ๐Ÿค— In general, more epochs - a higher accuracy, less epochs - a lower accuracy.

That's it. No magic, only math. Hope, you've understood it, bro ๐Ÿ˜Š See ya! Happy coding ๐Ÿ˜‡

Discussion (2)

Collapse
gayansandamal profile image
Gayan Sandamal

This is really helpful. I went through the other article and head began to spin. However I manage to complete reading it and then I moved to this article and began to read while referring some articles related to maths.
This is really interesting.

Collapse
jackpwriter profile image
JackPWriter

Informative discussion on how neural network works with details of weights initialization, forward propagation and back propagation. Those people who are searching about neural network and also taking nursing assignment assistance - eliteassignment.co.uk/nursing/ from the top academic writers at Elite Assignment in UK. They will sure find helpful stuff from this study.

Forem Open with the Forem app