DEV Community

Cover image for Building and Training a Neural Network with PyTorch: A Step-by-Step Guide
Paul Chibueze
Paul Chibueze

Posted on

Building and Training a Neural Network with PyTorch: A Step-by-Step Guide

Imagine a world where machines can not only see but also understand and classify images as effortlessly as humans. This capability has been at the heart of many breakthroughs in artificial intelligence, revolutionizing fields from healthcare to retail.

In recent years, advancements in deep learning have enabled computers to recognize objects, identify faces, and even understand emotions depicted in images. One of the pivotal tasks in this domain is image classification — teaching computers to categorize images into predefined classes based on their visual features.

In this guide, we’ll embark on a journey to build and train a neural network using PyTorch. We’ll start by preparing our data — transforming raw images into a format suitable for training our model. Then, we’ll delve into defining our neural network architecture, which will learn to recognize various clothing items based on their pixel patterns. For this project we will use FashionMNIST dataset.

FashionMNIST, is a dataset that captures grayscale images of clothing items, serves as an excellent playground for learning and mastering image classification techniques. Similar to its predecessor, MNIST (which consists of handwritten digits), FashionMNIST challenges us to distinguish between different types of apparel with the aid of deep learning models. PyTorch provides tools to download and load datasets conveniently.

As we progress, we’ll explore how to train our model using backpropagation and gradient descent, evaluate its performance on unseen data, and ensure it generalizes well to new examples.

Finally, we’ll learn how to save our trained model’s parameters, enabling us to deploy it in real-world applications or continue refining its capabilities.

I guess you are already excited, I am too.

What is a Neural Network?

A neural network is a series of interconnected nodes, inspired by the structure of the human brain. It learns by processing data and adjusting its internal connections based on the results. In this case, the neural network will learn to recognize patterns in images of clothing and predict the corresponding category (t-shirt, dress, etc.).

Throughout this tutorial, we will cover essential steps in deep learning especially for building classification neural network models. Some of the steps we will employ includes:

  • Data Preparation: We will download and prepare our dataset, transforming it into a format suitable for training with PyTorch.

  • Model Definition: We will also define a neural network architecture using PyTorch’s nn.Module that will learn to classify images into different clothing categories.

  • Training and Evaluation: We will then implement the training loop to optimize our model’s parameters using gradient descent, evaluate its performance on test data, and monitor its progress.

  • Model Persistence: you will also see how to save and load trained models, allowing you to reuse them for predictions or further training.

By the end of this journey, you will not only have a grasp of the fundamental concepts of deep learning with PyTorch but also a practical understanding of how to apply them to real-world datasets.

Let’s embark on this learning adventure together!

Dataset Preparation

The first step is to prepare our dataset. Like I initially said, we will use the FashionMNIST dataset, which is readily available in PyTorch’s torchvision library. This dataset contains 70,000 grayscale images of 10 different classes of clothing items.

We start by importing the necessary libraries:

import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision.transforms import ToTensor
Enter fullscreen mode Exit fullscreen mode
  • torch: The core PyTorch library for building and training neural networks.

  • nn: A submodule of torch containing building blocks for neural networks like layers and activation functions.

  • DataLoader: A class from torch.utils.data that helps us load and iterate over datasets in batches.

  • datasets: A submodule of torchvision providing access to pre-downloaded datasets like Fashion MNIST.

  • ToTensor: A data transform that converts images to PyTorch tensors.

After we are done importing libraries, its time we download the training dataset and test data set too from the FashionMNIST plaform and also load them into our environment.

# download training data from the FashionMNISTdataset.
training_data = datasets.FashionMNIST(
    train=True,
    transform=ToTensor(),
    download=True,
    root="data"
)

# download test data from the FashionMNIST dataset.
test_data = datasets.FashionMNIST(
    train=False,
    transform=ToTensor(),
    download=True,
    root="data"
)
Enter fullscreen mode Exit fullscreen mode

The above code downloads the FashionMNIST dataset. We also specify that we want the training data by setting (train=True) and test data (train=False). We also apply the ToTensor transform, which converts the raw image data (pixel intensities between 0 and 255) into PyTorch tensors.

Data Loaders

Next step is to define or dataset loaders, Data loaders will help us load the dataset in batches, making it easier to manage memory and speed up training of our model. To define our data loaders for our model, we first declare the loading batch size.

batch_size = 64

# create data loaders
training_loader = DataLoader(training_data, batch_size=batch_size)
test_loader = DataLoader(test_data, batch_size=64)

for X, y in test_loader:
  print(f"Shape of X [N C H W]: {X.shape}")
  print(f"Shape of y: {y.shape} {y.dtype}")
  break
Enter fullscreen mode Exit fullscreen mode

We just define the batch size, which will help control how many images are processed at once during training. We then create data loaders for both the training and test data. Our configuration is that the data loaders will feed the data into the neural network in batches during training and evaluation.

Also we use the for loop to iterate through the batches of data and prints the shapes of the input images (X) and their corresponding labels (y). We see that X has a shape of [batch_size, channel, height, width], where batch_size is 64 in this case, channel is 1 (grayscale images), and height and width are both 28 (representing the 28x28 pixel images). The labels y are a one-dimensional tensor of integers representing the clothing categories.

Since we have defined and configured our data loaders for both the training and and testing datasets, lets then define how we mount our model unto our devices, in our case we will mount it into our CPU device.

# get cpu, gpu or mps device for training.
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")
Enter fullscreen mode Exit fullscreen mode
# OUTPUT

Using cpu device
Enter fullscreen mode Exit fullscreen mode

Our code checks if a GPU or MPS device is available and uses that for training if possible, otherwise it defaults to CPU. Using a GPU or MPS can significantly speed up the training process considering that training large neural models requires compute power and CPU allocation.

Consequently, all things being equal, we will continue with the next step which is defining our network.

Defining the Neural Network Model

We define a simple fully connected neural network. Our model will have three layers with ReLU activations in between.

To define a neural network in PyTorch, we create a class that inherits from nn.Module. We define the layers of the network in the init function and specify how data will pass through the network in the forward function. To accelerate operations in the neural network, we move it to the GPU or MPS if available.

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.Flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
        )

    def forward(self, x):
        x = self.Flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuralNetwork().to(device)
print(model)
Enter fullscreen mode Exit fullscreen mode
# OUTPUT

NeuralNetwork(
  (Flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
  )
)
Enter fullscreen mode Exit fullscreen mode

Few things you should know about our neural network:

  • nn.Module: Base class for all neural network modules in PyTorch.

  • Flatten: Flattens the input tensor.

  • nn.Sequential: A sequential container to define the layers of the model.

  • nn.Linear: Fully connected layer.

  • nn.ReLU: ReLU activation function.

Now, looks like we are all set, lets move over to defining our loss function and optimizer.

Defining the Loss Function and Optimizer

The loss function measures how well the model’s predictions match the actual labels. while the optimizer updates the model parameters to minimize the loss.

To handle this, we just define the following variables

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)
Enter fullscreen mode Exit fullscreen mode

Explaining each of the concepts, we will have that;

nn.CrossEntropyLoss: is a loss function used primarily for classification tasks where the model predicts probabilities for each class. It combines nn.LogSoftmax() and nn.NLLLoss() in one single class. The CrossEntropyLoss expects raw logits (the output of the model before applying soft max) as input. It computes the soft max internally to normalize logits and then computes the negative log likelihood loss between the predicted class probabilities and the actual target labels.

torch.optim.SGD: also is the optimizer that implements Stochastic Gradient Descent (SGD), a fundamental optimization algorithm used for training neural networks. SGD updates the model parameters in the direction of the negative gradient of the loss function with respect to the parameters. The model.parameters() argument specifies which parameters of the model should be optimized.

lr (Learning rate): which is a scalar factor that controls the step size taken during optimization. It determines how much to change the model parameters with respect to the gradient of the loss function. A higher learning rate can speed up convergence, but if it’s too high, it may cause the model to overshoot optimal values. Conversely, a lower learning rate can improve stability and precision but may require more iterations to converge.

momentum: Momentum simply is a parameter that accelerates SGD in the relevant direction and dampens oscillations. It improves the convergence rate and helps SGD to escape shallow local minima more effectively. A common value for momentum is 0.9, but it can be tuned depending on the specific problem and dataset characteristics.

In summary, these components together form the backbone of the optimization process during training. nn.CrossEntropyLoss computes the loss based on model predictions and target labels, torch.optim.SGD updates the model parameters based on the computed gradients, and lr and momentum are crucial hyperparameters that affect how quickly and effectively the model learns from the data. Adjusting these parameters can significantly impact the training process and model performance.

Defining our Training Function

The training function iterates over the data loader, computes predictions, calculates the loss, and updates the model parameters.

def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    print(f"size: {size}")
    for batch, (X, y) in enumerate(dataloader):
        X = X.to(device)  # Move input data to the device (GPU or CPU)
        y = y.to(device)  # Move target labels to the device (GPU or CPU)

        # compute predicted y by passing X to the model
        prediction = model(X)

        # compute the loss
        loss = loss_fn(prediction, y)

      #  apply zero gradients, perform a backward pass, and update the weights
        optimizer.zero_grad()  
        loss.backward()  
        optimizer.step()  

        # print training progress
        if batch % 100 == 0:
            loss_value = loss.item()  
            current = batch * len(X)
            print(f"loss: {loss_value:>7f}  [{current:>5d}/{size:>5d}]")
Enter fullscreen mode Exit fullscreen mode

Now, to check the model’s performance against the test dataset to ensure it is learning, lets define a test learning function

def test(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    num_batches = len(dataloader)
    model.eval()
    test_loss, correct = 0,0
    with torch.no_grad():
        for X, y in dataloader:
            X = X.to(device)
            y = y.to(device)
            prediction = model(X)
            test_loss += loss_fn(prediction, y).item()
            correct += (prediction.argmax(1) == y).type(torch.float).sum().item()
        test_loss /= num_batches
        correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
Enter fullscreen mode Exit fullscreen mode

Its time we train our model, lets do that in the next step

Defining the training loop

The training process is conducted over several iterations (epochs). During each epoch, the model learns parameters to make better predictions. We print the model’s accuracy and loss at each epoch; we’d like to see the accuracy increase and the loss decrease with every epoch.

epoch = 5
for t in range(epoch):
  print(f"Epoch {t+1}\n-------------------------------")
  train(training_loader, model, loss_fn, optimizer)
  test(test_loader, model, loss_fn)
print("Done!")
Enter fullscreen mode Exit fullscreen mode
# OUTPUT 

Epoch 1
-------------------------------
size: 60000
loss: 2.301722 [    0/60000]
loss: 2.196219 [ 6400/60000]
loss: 1.919408 [12800/60000]
loss: 1.602865 [19200/60000]
loss: 1.206242 [25600/60000]
loss: 1.089895 [32000/60000]
loss: 1.010409 [38400/60000]
loss: 0.888665 [44800/60000]
loss: 0.871484 [51200/60000]
loss: 0.801176 [57600/60000]
Test Error: 
 Accuracy: 70.4%, Avg loss: 0.797208 

Epoch 2
-------------------------------
size: 60000
loss: 0.793278 [    0/60000]
loss: 0.839569 [ 6400/60000]
loss: 0.590993 [12800/60000]
loss: 0.796638 [19200/60000]
loss: 0.679180 [25600/60000]
loss: 0.645485 [32000/60000]
loss: 0.705061 [38400/60000]
loss: 0.694501 [44800/60000]
loss: 0.680406 [51200/60000]
loss: 0.634787 [57600/60000]
Test Error: 
 Accuracy: 78.1%, Avg loss: 0.632338 

Epoch 3
-------------------------------
size: 60000
loss: 0.558544 [    0/60000]
loss: 0.660779 [ 6400/60000]
loss: 0.436486 [12800/60000]
loss: 0.679563 [19200/60000]
loss: 0.600478 [25600/60000]
loss: 0.567539 [32000/60000]
loss: 0.587003 [38400/60000]
loss: 0.657008 [44800/60000]
loss: 0.643853 [51200/60000]
loss: 0.547364 [57600/60000]
Test Error: 
 Accuracy: 80.3%, Avg loss: 0.560929 

Epoch 4
-------------------------------
size: 60000
loss: 0.462072 [    0/60000]
loss: 0.580780 [ 6400/60000]
loss: 0.374757 [12800/60000]
loss: 0.618166 [19200/60000]
loss: 0.552829 [25600/60000]
loss: 0.526478 [32000/60000]
loss: 0.529090 [38400/60000]
loss: 0.666382 [44800/60000]
loss: 0.634566 [51200/60000]
loss: 0.482042 [57600/60000]
Test Error: 
 Accuracy: 81.2%, Avg loss: 0.523512 

Epoch 5
-------------------------------
size: 60000
loss: 0.403316 [    0/60000]
loss: 0.539046 [ 6400/60000]
loss: 0.340361 [12800/60000]
loss: 0.577453 [19200/60000]
loss: 0.509404 [25600/60000]
loss: 0.496750 [32000/60000]
loss: 0.495348 [38400/60000]
loss: 0.670772 [44800/60000]
loss: 0.620382 [51200/60000]
loss: 0.439184 [57600/60000]
Test Error: 
 Accuracy: 82.2%, Avg loss: 0.500474 

Done!
Enter fullscreen mode Exit fullscreen mode

epochs: Number of times to iterate over the entire training dataset in our case 5 times.

train(): Calls the training function.

test(): Calls the evaluation(test) function.

At this point, we already have a trained model that can perfectly predict and classify images and provide output value or expected value as the case may be.

Moving forward, next thing to consider is ways to save our trained model, so that when we want to use or deploy them for application usage, we can easily call them and provide the required classes and values.

To save our defined model, we follow the following ways;

torch.save(model.state_dict(), "model.pth")
print("Saved PyTorch Model State to model.pth")
Enter fullscreen mode Exit fullscreen mode
>> # OUTPUT Saved PyTorch Model State to model.pth
Enter fullscreen mode Exit fullscreen mode

This approach will save the model and and serialize the internal state dictionary (containing the model parameters).

After saving the model, if next time we want to use our model for predictions, we will first load them into our compute space. And to do that we;

model = NeuralNetwork().to(device)
model.load_state_dict(torch.load("model.pth"))
Enter fullscreen mode Exit fullscreen mode
# OUTPUT <All keys matched successfully>
Enter fullscreen mode Exit fullscreen mode

The above process involves loading our model which also includes re-creating the model structure and loading the state dictionary into it.

Finally, to make use of our loaded model for maybe prediction or classification.

Model Usage for prediction

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]



# set model to evaluation mode
model.eval()


sample_index = 1  # sample Index (Change this index to select a different sample)
x, y = test_data[sample_index][0], test_data[sample_index][1]

# make prediction without gradient calculation
with torch.no_grad():
    x = x.to(device)
    prediction = model(x.unsqueeze(0))

    # get predicted and actual classes
    predicted, actual = classes[prediction.argmax(dim=1).item()], classes[y]

    print(f'Predicted: "{predicted}", Actual: "{actual}"')
Enter fullscreen mode Exit fullscreen mode
# OUTPUT: Predicted: "Pullover", Actual: "Pullover"
Enter fullscreen mode Exit fullscreen mode

Preparation and Data

classes = [
    "T-shirt/top",
    "Trouser",
    "Pullover",
    "Dress",
    "Coat",
    "Sandal",
    "Shirt",
    "Sneaker",
    "Bag",
    "Ankle boot",
]
Enter fullscreen mode Exit fullscreen mode

classes: This is a list of class labels that correspond to the categories the model is trained to recognize. Each index in this list represents a specific class.

Set Model to Evaluation Mode

model.eval()
Enter fullscreen mode Exit fullscreen mode

model.eval(): Sets the model to evaluation mode. This is important because some layers (e.g., dropout, batch normalization) behave differently during training and evaluation. In evaluation mode, these layers operate in inference mode, ensuring consistent results during testing.

Select a Single Test Sample

x, y = test_data[0][0], test_data[0][1]
Enter fullscreen mode Exit fullscreen mode

x, y = test_data[0][0], test_data[0][1]: Selects the first sample from the test_data dataset. x is the input data (e.g., an image), and y is the corresponding label (e.g., the class index).

Make Prediction Without Gradient Calculation

with torch.no_grad():
    x = x.to(device)
    pred = model(x)
Enter fullscreen mode Exit fullscreen mode

with torch.no_grad():: Disables gradient calculation, which is not needed for evaluation and reduces memory usage and computation time.

x = x.to(device): Moves the input data to the specified device (CPU or GPU) where the model is located.

pred = model(x): Passes the input data through the model to obtain the predictions. pred is typically a tensor containing the output logits or probabilities for each class.

To Determine Predicted and Actual Class Labels

predicted, actual = classes[pred[0].argmax(0)], classes[y]
Enter fullscreen mode Exit fullscreen mode

pred[0].argmax(0): Finds the index of the class with the highest score in the model's output for the first (and only) sample in the batch. This index corresponds to the predicted class.

classes[pred[0].argmax(0)]: Uses the index to look up the predicted class label from the classes list.

classes[y]: Uses the true label index y to look up the actual class label from the classes list.

Print the Predicted and Actual Class Labels

print(f'Predicted: "{predicted}", Actual: "{actual}"')
Enter fullscreen mode Exit fullscreen mode

Prints the predicted and actual class labels in a formatted string.

Conclusion

we walked through the entire process of building, training and evaluating a neural network using PyTorch with the FashionMNIST dataset. We covered essential concepts such as dataset preparation, defining a neural network model, setting up training and evaluation loops, saving and loading models, and making predictions.

Lastly, constant practice leads to mastery, so experiment with different models, hyperparameters, and datasets to deepen your understanding and improve your skills in deep learning and image classifications.

Till next time, but for now all I can say is, Happy coding! 🚀

Reference

What is a Neural Network? | IBM

Neural networks allow programs to recognize patterns and solve common problems in artificial intelligence, machine…*www.ibm.com

Fashion MNIST

An MNIST-like dataset of 70,000 28x28 labeled fashion images*www.kaggle.com

Quickstart - PyTorch Tutorials 2.3.0+cu121 documentation

Read the PyTorch Domains documentation to learn more about domain-specific libraries*pytorch.org

Building an Image Classification model with PyTorch from scratch

A step-by-step guide to building a CNN model with PyTorch.medium.com

Code

Machine-learning/cloth-classification-using-pytorch.ipynb at main · chibuezedev/Machine-learning

Collection of my Machine learning models. Contribute to chibuezedev/Machine-learning development by creating an account…github.com

Top comments (0)