DEV Community

Cover image for How to train an Iris dataset classifier with Tinygrad

Posted on

How to train an Iris dataset classifier with Tinygrad

Building an Iris Classification Model with TinyGrad

In this tutorial, we'll walk through the process of building a simple Iris classification model using TinyGrad, a lightweight deep learning framework. We'll cover the following topics:

This tutorial assumes that you at least have a basic understanding of neural networks and Linear functions.

Let's dive into building our Iris classification model step by step.

Introduction to Iris Dataset

The Iris dataset is a classic dataset in machine learning. It contains samples of three species of iris flowers, each with four features: sepal length, sepal width, petal length, and petal width. The goal is to classify the species based on these features.

Setting up the Environment

Before we begin, make sure you have TinyGrad and the required dependencies installed. You can find the installation instructions here.

The model

Let's start by building the model in

from tinygrad.tensor import Tensor
from tinygrad.nn import Linear

class IrisModel:
    def __init__(self):
        self.l1 = Linear(4, 16)
        self.l2 = Linear(16, 3)

    def forward(self,x: Tensor):
        x = self.l1(x).relu()
        x = self.l2(x)
        return x.log_softmax()
    def params(self):
        return [
            self.l1.weight, self.l2.weight

Enter fullscreen mode Exit fullscreen mode

We start out by importing the required classes from tinygrad

Much like in pytorch, we can create a class for our model. Since the iris dataset contains four fields [float, float, float, float], we can use simple linear layers to build our model.

The forward function takes an input tensor, x, and passes it to our class layers, l1 and l2. In the end, we use the log_softmax() function to get a tensor of probabilities for each field.

Data preprocessing and training

We will create a new file called where we will process our data from sklearn.datasets and use it to train our model

First, let's import the required:

from tinygrad.nn.optim import SGD
from tinygrad.tensor import Tensor
from tinygrad.nn import Linear

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from model import IrisModel
import numpy as np
Enter fullscreen mode Exit fullscreen mode

Using load_iris, we will load our data then split it into a training and testing subset

iris = load_iris()
X =
y =

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state=42)
Enter fullscreen mode Exit fullscreen mode

Next, we need to instance a base model, create an optimizer, and set our batch size and epoch number

model = IrisModel()
opt = SGD(model.params, lr=1e-3)
epochs = 10000
batch_size = 32
Enter fullscreen mode Exit fullscreen mode

Unlike in libraries like pytorch or tensorflow, we need to specify the parameters that will be updated in our model when using the optimizer, for convenience, I created a function called params to return a list of the weights of each linear layer, otherwise you would write something like:

opt = SGD([model.l1.weight, model.l2.weight], lr=1e-3)

Onto the actual training

Using a pytorch-like for loop approach, we will take a random sample from out training data, convert it to a tensor, then pass that input tensor to our model for training.

Usually, loss functions would come with their own class, but in tinygrad, they can be accessed using Tensor.

In this case, we are using Tensor.sparse_categorial_crossentropy since we are dealing with multi-class classification.

This loss function will take two inputs, our output tensor (out), and our labels tensor (target)

# = True
for i in range(epochs):
    sample = np.random.randint(0, X_train.shape[0], size=(32))
    batch = Tensor(X_train[sample], requires_grad=False)
    target = Tensor(y_train[sample])

    out = model.forward(batch)
    loss = Tensor.sparse_categorical_crossentropy(out, target)


    preds = out.numpy().argmax(axis=-1)
    acc = (preds == target.numpy()).mean()

    if epochs % 100 == 0:
        print(f"Epoch {i}, loss: {loss.numpy()}, acc: {acc}") = False
Enter fullscreen mode Exit fullscreen mode

When running the code, you may see lines like this

Epoch 9990, loss: 0.20084689557552338, acc: 0.96875
Epoch 9991, loss: 0.1804082840681076, acc: 1.0
Epoch 9992, loss: 0.28833892941474915, acc: 0.90625
Epoch 9993, loss: 0.15084490180015564, acc: 0.96875
Epoch 9994, loss: 0.1843332201242447, acc: 1.0
Epoch 9995, loss: 0.21117405593395233, acc: 0.96875
Epoch 9996, loss: 0.2075180560350418, acc: 0.96875
Epoch 9997, loss: 0.13138934969902039, acc: 1.0
Enter fullscreen mode Exit fullscreen mode

When the loss decreases, our model is correctly learning. It is also useful to calculate the test accuracy of our model to determine whether it is correctly making predictions.

Let's explain what opt.zero_grad(), loss.backward(), and opt.step() do


This line is responsible for clearing or zeroing out the gradients of the model's parameters. In the context of deep learning, during the backpropagation process, gradients of the loss with respect to the model's parameters are calculated. These gradients are used to update the model's parameters to minimize the loss. Before computing new gradients for the current batch of data, it's essential to clear the gradients from the previous batch. This line ensures that the gradients are initialized to zero.


After clearing the gradients, this line computes the gradients of the loss with respect to the model's parameters. It performs backpropagation through the computational graph of the model. In other words, it calculates how much each parameter should be adjusted to reduce the loss. The gradients are stored in the model's parameters and will be used in the next step for updating the model.


Finally, this line updates the model's parameters using the computed gradients. It's the step where the model learns and adapts to the data. The optimizer (opt in this case) is responsible for adjusting the parameters in a way that reduces the loss. Common optimizers like stochastic gradient descent (SGD) or variants like Adam or RMSprop use the gradients to determine the direction and magnitude of parameter updates. The learning rate, which is often a hyperparameter, controls the step size of the updates.

Our output is a tensor of probabilities, some negative, and some positive. We use np.argmax to receive the index of the value with the highest probability. This index corresponds to one of the 3 labels that the iris dataset proposes.

classes = {
    0: "iris-setosa",
    1: "iris-versicolor",
    2: "iris-virginica"
Enter fullscreen mode Exit fullscreen mode

More on this later

The last step is to make sure evaluate our model on a testing subset to check its accuracy on data it has not been trained on

avg_acc = 0
for i in range(100):
    samp = np.random.randint(0, X_test.shape[0], size=(32))
    batch = Tensor(X_test[samp], requires_grad=False).to("cuda")
    target = Tensor(y_test[samp]).to("cuda")

    out = model.forward(batch)
    preds = out.argmax(axis=-1).numpy()
    avg_acc += (preds == target.numpy()).mean()

print(f"Test accuracy: {avg_acc/100}")
Enter fullscreen mode Exit fullscreen mode

I won't explain this step too much because it's similar to the training step, except we are not actually modifying the weights of the model through backpropagation and loss function calculation.

All we want to do is check to see if our model output matches the real label for that sample of training data.

If you get an accuracy above 0.9, congratulations! You just build your first model using tinygrad and used it to train an iris flower classifier.

PyTorch and Tensorflow may be good, but Tinygrad takes a new perspective on machine learning by letting us see the operations that happen on a lower level.

using the environment variable "DEBUG=4" lets us see the different mlops (Machine Learning Operations) occuring.

Top comments (0)