If we remove all the maths and all the dependencies like numpy, what remains? It's surprisingly easy to understand ;)
- To the experts: I'm not one :p I just wanted to understand the basics of neural networks and share you the result. Don't be surprised if I take some shortcuts and make some simplifications. But if you find some of them too much or erroneous, feel free to develop in the comments. I'm still learning ;)
- To everyone: that's the first time ever I write an article ^^'
Here is the script with the neural network components: framagit.org/Meier-Link/py-utils/neuron.py
The script contains Neuron, Layer and Network classes, few tests to show what happens when we give them values and make them to learn how to find the right result, and a
if __name__ == '__main__' to run them.
You can simply run the script like
python3 neuron.py to perform the test, or tweak them to see how they behave.
The script given above also includes a lot of Python stuff to help manipulation of the network and its components.
So here I'll try to explain you what are those components ;)
To start from the lower level, they're:
- the Neuron class
- the Layer class
- and finally, the Network class
Either in the nature and in the artificial networks, the neuron is a tool which gets a value, runs some stuffs, and returns an other one.
In artifical networks, that neuron can be then updated from the expected value.
The given value is an input either from the outdoor (like heat, pressure, or the pixels of a picture which represents a dog - or a cookie), or from an upstream neuron.
The stuffs which runs inside the neuron 'are' the maths stuffs. They's a lot to say about it but here we really need to make the things as simple as possible. So here we just ran an affine function of the form
y = ax + b, where:
xis the input value
ais the weight of the function (so it's usually written
bis the bias of the function
yis the output of the neuron
The weight and the bias are values that we set beforehand, and we'll expect them to be updated in order to get the desired output after training of our network (see below).
The transformation is usually done using an Activation function but that becomes to much maths for us :p
The output of the neuron (it's return value) is then forwarded either to an other neuron, or to the outdoors (to tell you if you have fever or that picture shows you a cookie, not a dog).
When the neuron gives you a result, you can compare it with the expected result and calculate the difference. It's done using a loss function, but as we want to make things simple, we simply subtract our result from the expected one (Wikipedia will give you the real level of complexity that loss function can get :p).
And this is the loss which's used to update the neuron: in fact, we update the weight and the bias of the neuron in order to get closer to the expected result. The new values for weight and bias are calculated as follow:
new_value = old_value + (loss * learning_rate).
Okay, new value, old one, the loss, but... What's that "learning rate"? This is a value used to control at which rate the neuron will learn how to get closer to the expected result. If that value is too high, the neuron will learn faster, but will have more difficult to be accurate (i.e. it'll remain a bigger difference between expected result and the actual one). If we put a lower value, the neuron will learn slowly but surely.
Few. Enough of theory. Now, we'll see how do we code it with Python. Raw. Python.
class Neuron: """A simple neuron""" def __init__(self, weight:float, bias:float): """Create a neuron. We always initialize a neuron with a weight and bias. """ self._w = weight self._b = bias self._result = None @property def result(self): return self._result def process(self, data:float): """Process given input.""" self._result = self._w * data + self._b def update(self, expected:float, learning_rate:float): """The learning process. The expected value and learning rate are given on the update step. """ loss = expected - self._result self._w = self._w + (loss * learning_rate) self._b = self._b + (loss * learning_rate)
The layer... is just a list of neuron. Nothing more than that.
It provides the same methods as the Neuron class, since it applies the same process to all the neurons.
However, note those subtleties:
- all the neurons of the layer has the same weight and bias;
- all the neurons take the whole input of the layer...
- each neuron of the layer has its own expected output.
class Layer: """Really a simple list of neurons""" def __init__(self, weight:float, bias:float, size:int): """Create the layer""" self._neurons = [Neuron(weight, bias) for _ in range(size)] @property def neurons(self): """Make possible to see the neurons inside that layer.""" return self._neurons def process(self, data_set:Sequence[float]): """Give to the neurons the data to process.""" for n in self._neurons: n.process(data_set) def update(self, expected:Sequence[float], learning_rate:float): """Update each neuron of the layer.""" assert len(expected) == len(self._neurons) for n, e in dict(zip(self._neurons, expected)).items(): n.update(e, learning_rate)
Maybe you noticed the layer get a list of inputs and give all of them to the neurons.
But currently, the neuron accepts a single input.
Here, we are working with a fully connected neural network. This means all the neurons of a given layer get result of all the neurons of the previous layer. And to do that, neural networks implementations use the power of matrix.
In our case, we can do something somewhat easier to grasp: the neuron takes a list of float numbers as inputs. Usually, all those numbers are related (think of the pixels of an image. Whatever the way you choose to represent them, they're all represented the same way, for example
#ff00ff). So we can use the sum of those values, divided by the number of values. But each value must be weighted before applying the bias, so we obtain the following update function at the neuron level:
class Neuron: # ... def process(self, data:Sequence[float]): activated = sum([(self._w * d) for d in data]) / len(data) self._result = activated + self._b # ...
This way, each neuron can get the full data set from the previous layer :D
Here it'll start to be more complex.
To begin slowly, we just say that the network is a list of layers, just as a layer is a list of neuron.
So to begin, we just write the Network constructor:
class Network: """A simple fully connected network.""" def __init__(self, layers_conf:Sequence, learning_rate:int): """Create the network.""" self._lr = learning_rate self._layers =  for layer_conf in layers_conf: self._layers.append(Layer(**layer_conf))
In this first version, we're missing something important: the input layer gives each item of input data set to each neuron.
That means we want to specialize the input layer by inheriting the Layer class:
class InputLayer(Layer): def process(self, data_set:Sequece[float]): """Each neuron of the first layer get a single input.""" assert len(data_set) == len(self._neurons) for n, d in dict(zip(self._neurons, data_set)).items(): n.process([d])
And then, we update the Network constructor:
class Network: """A simple fully connected network.""" def __init__(self, layers_conf:Sequence, learning_rate:int): """Create the network.""" assert len(layers_conf) > 0 self._lr = learning_rate # Here we create our first layer which is an input layer self._layers = [InputLayer(**layers_conf)] # Then we add any additional layer as a classical Layer. for layer_conf in layers_conf[1:]: self._layers.append(Layer(**layer_conf))
Note that experts don't count the first layer. So when they say "here is a network of 3 layers", it means "a network with an input layer - but everyone knows there is an input layer, so we never speak of it - two hidden layers, and the output layer".
Then, the network has also its
update functions as follow:
class Network: # ... def process(self, data_set:Sequence[float]): """Treat the given sequence of float numbers.""" for l in self._layers: # Each layer treats the data, which means give the data to its neurons l.process(data_set) # Then we get the result to forward it to the next layer. data_set = [n.result for n in l.neurons] def update(self, expected:Sequence[float]): """Now, we can update the network according to the expected sequence.""" for l in self._layers: expected = l.update(expected, self._lr) # wut?!
Oups! I forgot to mention something. We seen far earlier that neurons learns from their loss. For a single neuron, or a single layer of neurons, it's easy, since we have its input and its output. But when you have a full network with many layers, we have to be able to ... propagate the expected output of each neuron to the neuron in the previous layer.
As you may noticed in the
Network::update function above, we expect the layer to return what its neurons expect as output.
Se we have to update the
Layer::update function as follow:
class Layer: # ... def update(self, expected:Sequence[float], learning_rate:float): """Update each neuron of the layer.""" assert len(expected) == len(self._neurons) hoped =  for n, e in dict(zip(self._neurons, expected)).items(): hoped.append(n.update(e, learning_rate)) # I've a bad feeling, about that... return hoped # ...
And as you may feel, I've to update the
Neuron::update function too:
class Neuron: # ... def update(self, expected:float, learning_rate:float): """The learning process. The expected value and learning rate are given on the update step. This neuron returns what it expects itself to let the input neuron use it. More on it later ;) """ loss = expected - self._result self._w = self._w + (loss * learning_rate) self._b = self._b + (loss * learning_rate) # Now, we return what the neuron expected as output... return (self._b - expected) / self._w # ...
Okay, I admit, there is a little math stuff hidden, here. If you remember, the function used to process the data is
y = ax + b, or, with the full names:
result = (weight * data) + bias.
Here, imagine we known the result, but want to find the data (or which data we want to obtain the result ... It's like looking for the question which gives us the answer "42" but don't worry, we don't make the universe explode :p).
So we have to extract the data we want, which gives us
data = (bias - result) / weight.
In machine learning (and to keep it simple), this is what is called backpropagation.
And we finished with the maths here! \o/
So, here is a little picture to summarize all that things:
The script linked at the beginning of this article provides many more:
- At the top of the script, there are
denormalizefunctions, which are used in the tests (see below), because I observed that the network is more efficient with values between 0 and 1.
- I tried to make the Neuron class easier to override with real world ones. That's why you have a bunch of
_prefixed methods ("protected" in the Python world).
- I implemented
__str__methods to be able to show "beautiful" outputs on the terminal (at least "readable" x))
- There is a "big"
LayerConfigurationclass I added to make it easier to configure a network. Basically, It receives the same inputs as the Layer constructor, and override
*operator to make possible to write
layers_conf = input_layer_conf + (hidden_layer_conf * 2) + output_layer_conf.
- Finally, there are
train_networkfunctions, which respectively train a neuron, a layer, and a network: it runs the
updatemethods in order to improve neurons weight and bias to obtain the expected result.
I made few tests with this really basic network, and I was seriously surprised by the accuracy of the results I got.
I encourage you to try it yourself, tweak the values, modify the algorithms, etc. Because it's really easy to understand how does it work by practicing by yourself ;)
I hope this article (longer than what I expected) was really useful for you and I'm open to any comments :D
Cover image and affine function image come from Wikipedia
Other pictures are from myself, using Google Drawing