Akshay Ballal

Posted on

# Deep Neural Network from Scratch in Rust π¦ - Part 3- Forward Propagation

In the previous part of our blog series, we discussed how to initialize a neural network (NN) model with specified layers and hidden units. Now, in this part, we will explore the forward propagation algorithm, a fundamental step in the NN's prediction process.

Before we delve into the coding aspect, let's understand the mathematical
concepts underlying forward propagation. We will use the following notations:

• Z[l]: Logit Matrix for layer `l`. It represents the
linear transformation of the inputs for a particular layer.

• A[l]: Activation matrix for layer `l`. It represents
the output or activation values of the neurons in a specific layer.

• W[l]: Weights matrix for layer `l`. It contains the
weights connecting the neurons of layer `l-1` to the neurons of
layer `l`.

• b[l]: Bias matrix for layer `l`. It contains the bias
values added to the linear transformation of the inputs for layer
`l`.

Additionally, we have the input matrix denoted as `X`, which is equal to the activation matrix A[0] of the input layer.

To perform forward propagation, we need to follow these two steps for each layer:

1. Calculate the logit matrix for each layer using the following expression:

Z[l] = W[l]A[l-1] + b[l]

In simpler terms, the logit matrix for layer `l` is obtained by taking the dot product of the weight matrix W[l] and the activation matrix A[l-1] from the previous layer, and then adding the bias matrix b[l]. This step represents the linear transformation of the inputs for the current layer.

2. Calculate the activation matrix from the logit matrix using an activation function:

A[l] = ActivationFunction(Z[l])

Here, the activation function can be any non-linear function applied element-wise to the elements of the logit matrix. Popular activation functions include sigmoid, tanh, and relu. In our model, we will use the relu activation function for all intermediate layers and sigmoid for the last layer (classifier layer). This step introduces non-linearity into the network, allowing it to learn and model complex relationships in the data.

For n[l] number of hidden units in layer `l` and `m` number of examples, these are the shapes of each matrix:

```

Z[l] βΎ [n[l] x m]

W[l] βΎ [n[l] x n[l-1]]

b[l] βΎ [n[l] x 1]

A[l] βΎ [n[l] x m]

```

During the forward propagation process, we will store the weight matrix, bias matrix, and logit matrix as cache. This stored information will prove useful in the subsequent step of backward propagation, where we update the model's parameters based on the computed gradients.

By performing forward propagation, our neural network takes the input data through all the layers, applying linear transformations and activation functions, and eventually produces a prediction or output at the final layer.

### Dependencies

Add this line to the `Cargo.toml` file.

``````num-integer = "0.1.45"
``````

### Cache Structs

First, in the `lib.rs` file we will define two structs - `LinearCache` and `ActivationCache`

``````//lib.rs

use num_integer::Roots;

#[derive(Clone, Debug)]
pub struct LinearCache {
pub a: Array2<f32>,
pub w: Array2<f32>,
pub b: Array2<f32>,
}

#[derive(Clone, Debug)]
pub struct ActivationCache {
pub z: Array2<f32>,
}

``````

The `LinearCache` struct stores the intermediate values needed for each layer. It includes the activation matrix `a`, weight matrix `w`, and bias matrix `b`. These matrices are used to calculate the logit matrix `z` in the forward propagation process.

The `ActivationCache` struct stores the logit matrix `z` for each layer. This cache is essential for later stages, such as backpropagation, where the stored values are required.

### Define Activation Functions

Next, let us define the non-linear activation functions that we will be using - `relu` and `sigmoid`

``````//lib.rs

pub fn sigmoid(z: &f32) -> f32 {
1.0 / (1.0 + E.powf(-z))
}

pub fn relu(z: &f32) -> f32 {
match *z > 0.0 {
true => *z,
false => 0.0,
}
}

pub fn sigmoid_activation(z: Array2<f32>) -> (Array2<f32>, ActivationCache) {
(z.mapv(|x| sigmoid(&x)), ActivationCache { z })
}

pub fn relu_activation(z: Array2<f32>) -> (Array2<f32>, ActivationCache) {
(z.mapv(|x| relu(&x)), ActivationCache { z })
}
``````

Activation functions introduce non-linearity to neural networks and play a crucial role in the forward propagation process. The code provides implementations for two commonly used activation functions: `sigmoid` and `relu`.

The `sigmoid` function takes a single value `z` as input and returns the sigmoid activation, which is calculated using the sigmoid formula: `1 / (1 + e^-z)`. The sigmoid function maps the input value to a range between 0 and 1, enabling the network to model non-linear relationships.

The `relu` function takes a single value `z` as input and applies the Rectified Linear Unit (ReLU) activation. If `z` is greater than zero, the function returns `z`; otherwise, it returns zero. ReLU is a popular activation function that introduces non-linearity and helps the network learn complex patterns.

Both `sigmoid` and `relu` functions are used for individual values or as building blocks for the matrix-based activation functions.

The code also provides two matrix-based activation functions: `sigmoid_activation` and `relu_activation`. These functions take a 2D matrix `z` as input and apply the respective activation function element-wise using the `mapv` function. The resulting activation matrix is returned along with an `ActivationCache` struct that stores the corresponding logit matrix.

### Linear Forward

``````
//lib.rs

pub fn linear_forward(
a: &Array2<f32>,
w: &Array2<f32>,
b: &Array2<f32>,
) -> (Array2<f32>, LinearCache) {
let z = w.dot(a) + b;

let cache = LinearCache {
a: a.clone(),
w: w.clone(),
b: b.clone(),
};
return (z, cache);
}
``````

The `linear_forward` function takes the activation matrix `a`, weight matrix `w`, and bias matrix `b` as inputs. It performs the linear transformation by calculating the dot product of `w` and `a`, and then adding `b` to the result. The resulting matrix `z` represents the logits of the layer. The function returns `z` along with a `LinearCache` struct that stores the input matrices for later use in backward propagation.

### Linear Forward Activation

``````
//lib.rs

pub fn linear_forward_activation(
a: &Array2<f32>,
w: &Array2<f32>,
b: &Array2<f32>,
activation: &str,
) -> Result<(Array2<f32>, (LinearCache, ActivationCache)), String> {
match activation {
"sigmoid" => {
let (z, linear_cache) = linear_forward(a, w, b);
let (a_next, activation_cache) = sigmoid_activation(z);
return Ok((a_next, (linear_cache, activation_cache)));
}
"relu" => {
let (z, linear_cache) = linear_forward(a, w, b);
let (a_next, activation_cache) = relu_activation(z);
return Ok((a_next, (linear_cache, activation_cache)));
}
_ => return Err("wrong activation string".to_string()),
}
}
``````

The `linear_forward_activation` function builds upon the `linear_forward` function. It takes the same input matrices as `linear_forward`, along with an additional `activation` parameter indicating the activation function to be applied. The function first calls `linear_forward` to obtain the logits `z` and the linear cache. Then, depending on the specified activation function, it calls either `sigmoid_activation` or `relu_activation` to compute the activation matrix `a_next` and the activation cache. The function returns `a_next` along with a tuple of the linear cache and activation cache, wrapped in a `Result` enum. If the specified activation function is not supported, an error message is returned.

### Forward Propagation

``````impl DeepNeuralNetwork {
/// Initializes the parameters of the neural network.
///
/// ### Returns
/// a Hashmap dictionary of randomly initialized weights and biases.
pub fn initialize_parameters(&self) -> HashMap<String, Array2<f32>> {
// same as last part
}

pub fn forward(
&self,
x: &Array2<f32>,
parameters: &HashMap<String, Array2<f32>>,
) -> (Array2<f32>, HashMap<String, (LinearCache, ActivationCache)>) {
let number_of_layers = self.layers.len()-1;

let mut a = x.clone();
let mut caches = HashMap::new();

for l in 1..number_of_layers {
let w_string = ["W", &l.to_string()].join("").to_string();
let b_string = ["b", &l.to_string()].join("").to_string();

let w = &parameters[&w_string];
let b = &parameters[&b_string];

let (a_temp, cache_temp) = linear_forward_activation(&a, w, b, "relu").unwrap();

a = a_temp;

caches.insert(l.to_string(), cache_temp);
}
// Compute activation of last layer with sigmoid
let weight_string = ["W", &(number_of_layers).to_string()].join("").to_string();
let bias_string = ["b", &(number_of_layers).to_string()].join("").to_string();

let w = &parameters[&weight_string];
let b = &parameters[&bias_string];

let (al, cache) = linear_forward_activation(&a, w, b, "sigmoid").unwrap();
caches.insert(number_of_layers.to_string(), cache);

return (al, caches);
}
}
``````

The `forward` method in the `DeepNeuralNetwork` implementation performs the forward propagation process for the entire neural network. It takes the input matrix `x` and the parameters (weights and biases) as inputs. The method initializes the `a` matrix as a copy of `x` and creates an empty hashmap `caches` to store the caches for each layer.

Next, it iterates over each layer (except the last layer) in a for loop. For each layer, it retrieves the corresponding weights `w` and biases `b` from the parameters using string concatenation. It then calls `linear_forward_activation` with `a`, `w`, `b`, and the activation function set to "relu". The resulting activation matrix `a_temp` and the cache `cache_temp` are stored in the `caches` hashmap using the layer index as the key. The `a` matrix is updated to `a_temp` for the next iteration.

After processing all intermediate layers, the activation of the last layer is computed using the sigmoid activation function. It retrieves the weights `w` and biases `b` for the last layer from the parameters and calls `linear_forward_activation` with `a`, `w`, `b`, and the activation function set to "sigmoid". The resulting activation matrix `al` and the cache `cache` are stored in the `caches` hashmap using the last layer index as the key.

Finally, the method returns the final activation matrix `al` and the `caches` hashmap containing all the caches for each layer. Here `al` is the activation of the final layer and will be used to make the predictions during the inference part of our process.

### That is all for Forward Propagation

In conclusion, we've covered an important aspect of building a deep neural network in this blog post: forward propagation. We learned how the input data moves through the layers, undergoes linear transformations, and is activated using different functions.

But our journey doesn't end here! In the next blog post, we'll dive into exciting topics like loss function and backward propagation. We'll explore how to measure the error between predictions and actual outputs, and how to use that error to update our model. These steps are crucial for training the neural network and improving its performance.

So, stay tuned for the next blog post, where we'll understand and implement a binary cross-entropy loss function and perform backpropagation.

π My Website

Akshay Ballal

I hope you guys are able to get some good information from this series. Just 2 more parts left and after that you will be able to feed in images of cats and non cats and your NN model will classify them. Backward Propagation and Loss function releasing tomorrow and the last part will be all about inference. Excited to show you guys how neural networks are build from scratch. Hang in there.

Also let me know if you want to get more content on other internal mechanisms of neural networks like different optimizers, loss-functions, dropout, mini-batching and build them from scratch. Like this comment to show your interestπ.

Akshay Ballal

Hey Arthur,

Thanks for the encouraging words.

And yes, I want to write an article dedicated to activation functions and their effect on the overall model. I am working on it. Should be out soon. But in a nutshell, these days the tanh function is more widely used for the final activation layer of multiclass classification problems. And Relu is most predominantly used in the intermediate layers to avoid the vanishing gradient issue. Will dig deeper into this in the upcoming articles.

Hope you will find some useful information in the future content.