Austin

Posted on Oct 25, 2023

How to train an Iris dataset classifier with Tinygrad

#ai #tinygrad #machinelearning #tutorial

Building an Iris Classification Model with TinyGrad

In this tutorial, we'll walk through the process of building a simple Iris classification model using TinyGrad, a lightweight deep learning framework. We'll cover the following topics:

This tutorial assumes that you at least have a basic understanding of neural networks and Linear functions.

Let's dive into building our Iris classification model step by step.

Introduction to Iris Dataset

The Iris dataset is a classic dataset in machine learning. It contains samples of three species of iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. The goal is to classify the species based on these features.

Setting up the Environment

Before we begin, make sure you have TinyGrad and the required dependencies installed. You can find the installation instructions here.

The model

Let's start by building the model in model.py

# model.py
from tinygrad.tensor import Tensor
from tinygrad.nn import Linear

class IrisModel:
    def __init__(self):
        self.l1 = Linear(4, 16)
        self.l2 = Linear(16, 3)

    def forward(self,x: Tensor):
        x = self.l1(x).relu()
        x = self.l2(x)
        return x.log_softmax()
    @property
    def params(self):
        return [
            self.l1.weight, self.l2.weight
        ]

We start out by importing the required classes from tinygrad

Much like in pytorch, we can create a class for our model. Since the iris dataset contains four fields [float, float, float, float], we can use simple linear layers to build our model.

The forward function takes an input tensor, x, and passes it to our class layers, l1 and l2. In the end, we use the log_softmax() function to get a tensor of probabilities for each field.

Data preprocessing and training

We will create a new file called training.py where we will process our data from sklearn.datasets and use it to train our model

First, let's import the required:

# training.py
from tinygrad.nn.optim import SGD
from tinygrad.tensor import Tensor
from tinygrad.nn import Linear

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from model import IrisModel
import numpy as np

Using load_iris, we will load our data then split it into a training and testing subset

# training.py
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)

Next, we need to instance a base model, create an optimizer, and set our batch size and epoch number

# training.py
model = IrisModel()
opt = SGD(model.params, lr=1e-3)
epochs = 10000
batch_size = 32

Unlike in libraries like pytorch or tensorflow, we need to specify the parameters that will be updated in our model when using the optimizer, for convenience, I created a function called params to return a list of the weights of each linear layer, otherwise you would write something like:

opt = SGD([model.l1.weight, model.l2.weight], lr=1e-3)

Onto the actual training

Using a pytorch-like for loop approach, we will take a random sample from out training data, convert it to a tensor, then pass that input tensor to our model for training.

Usually, loss functions would come with their own class, but in tinygrad, they can be accessed using Tensor.

In this case, we are using Tensor.sparse_categorial_crossentropy since we are dealing with multi-class classification.

This loss function will take two inputs, our output tensor (out), and our labels tensor (target)

# training.py

Tensor.training = True
for i in range(epochs):
    sample = np.random.randint(0, X_train.shape[0], size=(32))
    batch = Tensor(X_train[sample], requires_grad=False)
    target = Tensor(y_train[sample])

    out = model.forward(batch)
    loss = Tensor.sparse_categorical_crossentropy(out, target)

    opt.zero_grad()
    loss.backward()
    opt.step()

    preds = out.numpy().argmax(axis=-1)
    acc = (preds == target.numpy()).mean()

    if epochs % 100 == 0:
        print(f"Epoch {i}, loss: {loss.numpy()}, acc: {acc}")

Tensor.training = False

When running the code, you may see lines like this

Epoch 9990, loss: 0.20084689557552338, acc: 0.96875
Epoch 9991, loss: 0.1804082840681076, acc: 1.0
Epoch 9992, loss: 0.28833892941474915, acc: 0.90625
Epoch 9993, loss: 0.15084490180015564, acc: 0.96875
Epoch 9994, loss: 0.1843332201242447, acc: 1.0
Epoch 9995, loss: 0.21117405593395233, acc: 0.96875
Epoch 9996, loss: 0.2075180560350418, acc: 0.96875
Epoch 9997, loss: 0.13138934969902039, acc: 1.0

When the loss decreases, our model is correctly learning. It is also useful to calculate the test accuracy of our model to determine whether it is correctly making predictions.

Let's explain what opt.zero_grad(), loss.backward(), and opt.step() do

opt.zero_grad()

This line is responsible for clearing or zeroing out the gradients of the model's parameters. In the context of deep learning, during the backpropagation process, gradients of the loss with respect to the model's parameters are calculated. These gradients are used to update the model's parameters to minimize the loss. Before computing new gradients for the current batch of data, it's essential to clear the gradients from the previous batch. This line ensures that the gradients are initialized to zero.

loss.backward()

After clearing the gradients, this line computes the gradients of the loss with respect to the model's parameters. It performs backpropagation through the computational graph of the model. In other words, it calculates how much each parameter should be adjusted to reduce the loss. The gradients are stored in the model's parameters and will be used in the next step for updating the model.

opt.step()

Finally, this line updates the model's parameters using the computed gradients. It's the step where the model learns and adapts to the data. The optimizer (opt in this case) is responsible for adjusting the parameters in a way that reduces the loss. Common optimizers like stochastic gradient descent (SGD) or variants like Adam or RMSprop use the gradients to determine the direction and magnitude of parameter updates. The learning rate, which is often a hyperparameter, controls the step size of the updates.

Our output is a tensor of probabilities, some negative, and some positive. We use np.argmax to receive the index of the value with the highest probability. This index corresponds to one of the 3 labels that the iris dataset proposes.

classes = {
    0: "iris-setosa",
    1: "iris-versicolor",
    2: "iris-virginica"
}

DEV Community

How to train an Iris dataset classifier with Tinygrad

Building an Iris Classification Model with TinyGrad

Introduction to Iris Dataset

Setting up the Environment

The model

Data preprocessing and training

Onto the actual training

opt.zero_grad()

loss.backward()

opt.step()

Top comments (0)

Read next

Computer Vision Meetup: Towards Resource Efficient Robust Text-to-Image Generative Models

survey: Risk Prediction of Digital Transformation of Manufacturing Supply Chain Based on PCA and BPNN

Safeguarding AI with Llama Guard: Ethical AI Development

AI enthusiasm #4 - Your stable diffusion chatbot🐠