DEV Community

Cover image for Understanding Deep Learning By Building Custom Neural Network With PyTorch
Victor Isaac Oshimua
Victor Isaac Oshimua

Posted on

Understanding Deep Learning By Building Custom Neural Network With PyTorch

To understand how deep learning models work, it's crucial to know how to create your own basic neural network. It doesn't have to be complicated; starting with a simple neural network is an essential first step in understanding how they function. In this article, you'll learn about how deep learning models function by creating your very first neural network.


To follow this article's tutorial, you'll need to:

  • Have a grasp of Python programming and understand how to work with Python classes.
  • Understand how to work with tensors, as they are the fundamental building blocks of deep learning models.

Deep learning primer

Deep learning and neural networks are often confused with one another; while deep learning and neural networks are closely related, they are not synonymous terms.

A neural network is a computational model inspired by the structure and functioning of the human brain's neural networks. It consists of interconnected nodes (neurons) arranged in layers, where each neuron receives input, processes it, and passes the output to the next layer. Neural networks can be used for various tasks, such as classification, regression, and pattern recognition.

Deep learning is a subfield of machine learning that involves training neural networks with large amounts of data to learn hierarchical representations of the input data. Deep learning algorithms use multiple layers (hence the term "deep") of neural networks to extract features from raw data automatically.

Deep learning models are comprised of layers of interconnected neurons, each serving a specific function:

  • The Input Layer: This layer accepts the feature inputs.
  • Hidden Layers: These layers process the inputs through multiple neurons and layers.
  • The Output Layer: This layer generates the final prediction or classification.


The way these layers are structured defines the type of neural network. which could be such as feedforward neural networks, convolutional neural networks (CNN), or recurrent neural networks (RNN).

Writing your first custom neural network.

Now that you have an understanding of what deep learning and neural networks really are, let's dive into creating a custom neural network with PyTorch.

Creating a custom neural network is building a new module upon the PyTorch neural network torch.nn module. Here, you're essentially creating a new class that inherits from torch.nn.Module, and defining an __init__method to initialise the module's parameters, as well as a forward method that performs the custom neural network computation. With this, we can create our own custom module. Your object-oriented programming (OOP) knowledge in Python is important here.

Before proceeding to write your custom model, it is important to become familiar with the essential PyTorch modules. These modules can be utilised to create any custom neural network you envision.

PyTorch Module What does it do?
torch.nn This module encompasses fundamental components for constructing computational graphs, which represent sequences of computations executed in a specific manner.
torch.nn.Parameter It holds tensors utilised with nn.Module. When requires_grad=True, it automatically computes gradients (essential for updating model parameters via gradient descent), a process commonly known as "autograd".
torch.nn.Module Serving as the base class for all neural network modules, it comprises all the basic elements required for neural networks, with subclasses embodying different building blocks. If constructing a neural network in PyTorch, models should inherit from nn.Module and implement a forward() method.
torch.optim This module contains various optimization algorithms, guiding the model parameters stored in nn.Parameter to adjust optimally to enhance gradient descent and subsequently minimize loss.
def forward() All subclasses of nn.Module necessitate a forward() method, specifying the computation to be performed on the data provided to the specific nn.Module.

Lets get started

If you'll be working from your local computer, run the following command to install PyTorch.

pip install torch

Enter fullscreen mode Exit fullscreen mode

Alternatively, if you're using Google Colab, PyTorch is pre-installed, so you can start working without any need for installation.

Here's a step-by-step guide demonstrating the process of creating a custom module in PyTorch and training it on a dataset we've generated.

Step 1: Import libraries

# Import libraries
import torch  # Import the PyTorch library
import torch.nn as nn  # Import the neural network module from PyTorch
import torch.optim as optim  # Import the optimization module from PyTorch

Enter fullscreen mode Exit fullscreen mode

Step 2: Prepare dataset

Let's create a dataset with tensors:

# Prepare custom dataset

# Define tensors A and B with specified values and data type
A = torch.tensor([[10,20],[30,40]], dtype=torch.float32)
B = torch.tensor([[50,60],[70,80]], dtype=torch.float32)

# Concatenate tensors A and B along dimension 0 to create input features
X =, B), 0)  # features

# Create tensor Y representing the labels
Y = torch.tensor([0,0,1,1])  # label

Enter fullscreen mode Exit fullscreen mode

Step 3: Create the module

Here we will create a new class that inherits from the nn.Module class. The neural network architecture is defined in the __init__ method.

# Define neural network architecture

class custom_neural_network(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        Initialize the custom neural network.

            input_size (int): Size of the input features.
            hidden_size (int): Size of the hidden layer.
            output_size (int): Size of the output.
        super(custom_neural_network, self).__init__()
        # Define layers
        self.linear1 = nn.Linear(input_size, hidden_size)  # Input layer to hidden layer
        self.linear2 = nn.Linear(hidden_size, output_size)  # Hidden layer to output layer

    def forward(self, input):
        Forward pass method for the neural network.

            input (torch.Tensor): Input tensor to the neural network.

            torch.Tensor: Output tensor from the neural network.
        lin = self.linear1(input)  # Linear transformation from input to hidden layer
        output = nn.functional.relu(lin)  # Applying ReLU activation function for introducing non-linearity
        pred = self.linear2(output)  # Linear transformation from hidden to output layer
        return pred

Enter fullscreen mode Exit fullscreen mode

The network architecture comprises two linear layers (linear1 and linear2), which serve as the input-to-hidden and hidden-to-output layers, respectively.

The forward method is responsible for executing the forward pass of the neural network. It orchestrates the flow of data through the layers, starting with linear transformations.

Crucially, between these linear operations, the ReLU activation function is applied. This introduces non-linearity to the network, enabling it to learn complex relationships within the data.
It's important to note that without an activation function like ReLU, the network would behave akin to a linear regression model, severely limiting its ability to capture and represent intricate patterns in the data.

Step 4: Choosing Loss Function and Optimizer

The loss function assesses how accurately the model's predictions align with the actual target values. It's a mathematical function that takes into account both the model's predictions and the true labels, yielding a single scalar value representing the degree of error. The primary aim is to determine the model parameters that minimise this error. Common loss functions in PyTorch include:

  • Mean Squared Error (MSE): Typically used in regression tasks.
  • Cross-Entropy Loss: Commonly employed in classification tasks.

To minimise the loss, various optimisation techniques are employed to adjust the model's weights. These techniques include:

  • Gradient Descent: Modifies the weights in a manner that decreases the loss.
  • Stochastic Gradient Descent (SGD): A variant that calculates the gradient using either a single example or a small batch of examples.
  • Adam: A popular optimisation method that combines the advantages of other techniques.
# Define the loss criterion using Cross Entropy Loss
criterion = nn.CrossEntropyLoss()

# Define the optimizer using the Adam optimizer
# It optimizes the parameters of the model using the gradients computed during backpropagation
# lr=0.001 sets the learning rate to 0.001

# Define the input, hidden, and output sizes for the neural network
input_size = 2
hidden_size = 4
output_size = 2

# Instantiate a model with the specified input, hidden, and output sizes
model = custom_neural_network(input_size, hidden_size, output_size)

# Set up the optimizer to update the parameters of the model during training
optimizer = optim.Adam(model.parameters(), lr=0.001)

Enter fullscreen mode Exit fullscreen mode

Step 5: Training the model

Training the neural network to learn from the data involves multiple loops of iterations. Here's how to train the model:

epochs = 500

# Loop through each epoch for training
for epoch in range(epochs):
    # Zero the gradients to prevent accumulation from previous iterations

    # Forward pass: compute predicted outputs by passing inputs to the model
    outputs = model(X)

    # Compute the loss using the specified criterion
    loss = criterion(outputs, Y)

    # Backward pass: compute gradient of the loss with respect to model parameters

    # Update the weights of the model

    # Print loss every 50 epochs
    if (epoch + 1) % 50 == 0:
        print(f'Epoch {epoch+1}/{epochs}, Loss: {loss.item()}')

Enter fullscreen mode Exit fullscreen mode

Step 6: Evaluating the model

In addition to this iterative training process, you can assess the model's performance on fresh data or examine its outputs for specific inputs.

# Use torch.no_grad() to disable gradient calculation during inference
with torch.no_grad():
    # Pass input data X through the model to obtain predictions
    outputs = model(X)
    # Extract the predicted class indices by taking the maximum value along the second dimension
    _, predicted = torch.max(outputs, 1)
    # Print the predicted class indices
    print('Predicted:', predicted)

Enter fullscreen mode Exit fullscreen mode

torch.no_grad() is used to disable gradient calculation since we are performing inference, not training. The input data X is passed through the model to obtain predictions stored in outputs. The predicted class indices are extracted by taking the maximum value along the second dimension of the outputs tensor. Finally, the predicted class indices are printed.

Puting it all together

Building a custom neural network with PyTorch is a powerful and flexible process that allows you to tailor your model to the specific needs of your task. In addition to defining the model architecture, activation functions, and implementing the forward pass, you have the freedom to customise various aspects of the network, such as regularisation techniques, initialization methods, and advanced layer configurations.

After instantiating the model, selecting an optimizer, and specifying a loss function, you embark on the exciting journey of training the model on your dataset. Throughout this process, you have the opportunity to monitor the training progress, evaluate the model's performance, and fine-tune hyperparameters to achieve the best results.

Thank you for your dedication in following the tutorial to its completion. Your curiosity and commitment are key ingredients in mastering the art of deep learning. Should you encounter any challenges or have ideas for improvement, please don't hesitate to reach out. Your feedback is invaluable in our collective pursuit of knowledge and innovation.

Full code

Top comments (0)