Artificial Neural Network Structure Interpretability Notes

#deeplearning #machinelearning

Neural Network has been a raze in machine learning these days , but now there is a growing sense that neural networks output needs to be more interpretable to the human because with interpretability of the results given by the model, trust will increase.

So, I am sharing my notes which I had made in order to understand some basic terms used in ANN, understanding of which is critical for getting intuitive absorption of any variant of a general neural network architecture.

1. What are neurons?

Neuron is the building block/node of a neural network .It is basically the transformed data after applying the filter. Each neuron contains set of data inputs, weights and a bias value. So, if a filter(3,3,3) applied on the input array , that means that neuron(filter) has 27 trainable weights. And also each neuron(filter) only looks over a particular region in the input called as receptive field. A neuron can , hence ,consist multiple inputs.

2. What are weights and bias?

Weights are the parameter of the neural network that leverages the degree of importance by which transformation of the input data within the hidden layer shall take place. Bias is also a learnable parameter of neural network. Bias, essentially represents how far off are the prediction from their intended value. Each interconnection between two neurons of different layer has a weight and bias associated with it. When the model training starts, weights and bias for the network's first hidden layer are randomly taken.

Weights are then assigned , first by feedforward approach till the output layer is reached, and then are often tweaked/improved backwards usually through backpropagation technique (other approaches also exists like swarm intelligence) till input layer , so as to produce the desired output.

Weights for a single neuron have shape as [width,height,input_ channels]

As weights are learnt during training through backpropagation usually(other algorithms are also there like swarm intelligence), so this basically means applied filters impact are learnt during backpropagation.

3. What is activation function?

Activation Function is the function that determines the way in which sum of weights will be transformed to the output.

It also explores the functional sub-space of the NN model training data inputs to learn to represent weights. AFs define the output of a neuron and introduces non-linearities in the network. A neural network without AF will be like a linear regression model, so cannot perform any complex modelling.

These functions outputs the combinations of neurons called as activations, that fire up in response to a particular input.

4. Parametric and Non-parametric ML Activation Functions?

Parametric ML algorithms assumes some fixed function shape(with parameters) based on which model is trained. Although the number of parameters in a parametric AF is fixed, but it can be trainable/adaptive. So , all traditional AFs such as sigmoid, tanh, and ReLU are fixed or parametric activation functions.
Few Adaptive AF which has fixed number of trainable parameters also comes as fixed or parametric AF. e.g. Swish, AReLU, Mish, PReLU etc

Non-parametric Activation functions are the one in which neither the number of parametric is fixed , not their values. i.e. non-parametric AF has to be adaptive , but every adaptive activation function need not be non-parametric AF

Three Approaches for implementing NP-AFs are

Adaptive piecewise linear method (APL)

Spline AFs

Maxout network based AFs

5. What are optimizers?

An optimizer is a concept that works over the loss function of the neural network model to determine how weights will be updated. Usually backpropagation algorithm is the core technique of most of the optimizers.

6. What are layers in neural network?

Layers are the components of a neural network through which the network learns the training dataset by progressively blending the data, transforming and wrapping it up into a form (called as
representations) that is easy to understand by computers, to solve the task. Each representation can be thought as a type. Each layer is just technically a function wrapped over activation function on the inputs of that layer which is basically the output of the previous layer. Each layer has to discover a new representation of the input. This newly learnt or outputted representation can be seen as three-dimensional cube , where each cell in the cube is an activation(output from the AF, or the amount a neuron fires).

So, neural network as a whole, is just chain of composed functions. This composed network is then, optimized to perform a task.

Note: Making sense of the activations returned from the hidden layer is hard because we usually work with them as abstract numerical vectors. However, with feature visualization techniques , we can transform this abstract vector into a more meaningful semantic dictionary.

7. What are feature maps?

Collection of multiple filters(neurons) which looks at different region of the input image with the same weights , but to extract the same feature( but from different regions of the input).

So, technically feature maps has number of same filters applied on the input. e.g. feature map dimensions:- 64(3,3,3) which means filter size (3, 3, 3), applied 64 times

8. What are pooling layer in CNN , and its uses?

Pooling layer aggregates the features learnt after applying feature maps over input array. Looks at the larger region of the image and captures an aggregate statistics(max, average, etc) of each region. This helps in identifying if the learnt features are the one that we were interested or not, and also makes the network invariant to the local transformations.

Most commonly used Pooling layer Types

a. Max pooling
b. Average pooling

Pooling operates on each feature map independently . It reduces the size(width and height) of each feature map, but number of feature maps remains constant.

By doing so , it makes the representation more compact by reducing the spatial size, (potential disadvantage)- can cause information loss at other end.

Deep CNNs used without pooling layers are called capsule networks.

9. Which layer contains the trainable parameters in the neural network model and which does not?

Convolution layer and fully connected layers contain trainable parameters , pooling layer does not.

10. What is neural network training like and what are its objectives?

The training of neural network is actually an non-convex optimization problem in which the objective is to find the optimum weights parameters(which minimizes the total loss or error), which are searched and found out through backpropagation algorithm.

► Overall Steps in a broader sense

a. Firstly , data inputs is transformed into representations that computer can understand though activation functions, these activations holds the initialized weights (through feedforward).

b. Now, a convex loss function for the linear weighted combinations of those activation(outputs of AFs wrapped in a hidden layer) applied to an input is defined to find the optimum weight needed to minimize the loss in solving the problem.

c. Lastly, the optimum weights are searched iteratively during training process by adapting suitable optimisation strategies based on backpropagation usually for the weight parameters by minimizing the loss function in the functional sub-space.