The most important thing in all of machine learning is the model. A model is a computer program that is trained using a dataset to recognize patterns and perform a given task. It emphasizes the training aspect, as opposed to regular programs that are manually programmed by humans. Instead, a machine learning model learns from data using an algorithm. In this article, we will be building a machine learning model with TensorFlow.js.
The Task
Before we begin building our model, we need to define the task at hand.
The image above displays a graph that plots a series of dots. Our goal is to find a straight line that can best fit these dots. This problem is known as linear regression. The line we are searching for can be considered as our model. So, how do we find this line?
We humans can easily draw a line that fits the data. So the goal is to train our computer program to do the same.
We will explore two approaches: one using statistical methods without machine learning, which utilizes mathematical equations to find the best fit line, and the other using machine learning to find the best fit line for us.
The dataset
First, we need to create a dataset. We are going to use TensorFlow.js operations to create an artificial dataset. We will use an equation to generate the data. Here's the equation:
This equation represents a straight line, where m is the slope and c is the intercept. x is the input and y is the output. Let's randomly select values for m and c. For m, let's choose 2, and for c, let's choose 1. This gives us the equation:
However, this equation will only give us a straight line. To make the data entries in our dataset unique, we need to add noise. Let's modify the equation:
For our x values, we are going to generate one hundred values between 0 and 1. The value of y depends on x. We can use the equation to create new values of y. The equation says that 2 times x plus 1 plus random noise will give us y.
Now let's generate x values in code:
const x_data = tf.linspace(0, 1, 100);
The code above uses the linspace function to generate 100 values between 0 and 1. Let's convert the equation into code using TensorFlow.js:
const y_data = tf.add(tf.mul(2, x_data), 1).add(tf.randomNormal([100], 0, 0.1));
These operations multiply x by 2, add 1, and then add random noise to generate the corresponding y values for the dataset.
Ordinary Least squares
The mathematical technique we are going to use is called Ordinary least squares. This technique aims to find a line that best fits the dataset by calculating the difference between the predicted value and the expected value. Here is the equation:
represents our dependent variable or the y-coordinate in our graph. represents our independent variable or the x-coordinate in the graph. represents the intercept, and represents the slope. From our dataset, we obtain values for and . To calculate the values of and , we use the following equation:
Let's convert the equation above into code. and are mean values of and . In TensorFlow.js, we can represent them as:
const xMean = tf.mean(x);
const yMean = tf.mean(y);
The numerator of the first equation can be represented as:
const numerator = tf.sum(tf.mul(tf.sub(x, xMean), tf.sub(y, yMean)));
While the denominator can be represented as:
const denominator = tf.sum(tf.square(tf.sub(x, xMean)));
Now we can obtain our slope( ) and intercept( ):
const slope = numerator.div(denominator);
const intercept = yMean.sub(slope.mul(xMean));
We can combine everything into a single function:
function calculateInterceptAndSlope(x, y) {
const xMean = tf.mean(x);
const yMean = tf.mean(y);
const numerator = tf.sum(tf.mul(tf.sub(x, xMean), tf.sub(y, yMean)));
const denominator = tf.sum(tf.square(tf.sub(x, xMean)));
const slope = numerator.div(denominator);
const intercept = yMean.sub(slope.mul(xMean));
return { intercept: intercept.dataSync()[0], slope: slope.dataSync()[0] };
}
Let's provide the function with the data we defined earlier:
const { intercept, slope } = calculateInterceptAndSlope(x_data, y_data);
console.log('Intercept:', intercept); // Intercept: 0.980187177658081
console.log('Slope:', slope); // Slope: 2.0238144397735596
Our calculateInterceptAndSlope
function calculates the intercept as 0.980187177658081 and the slope as 2.0238144397735596, which is close to the values we defined. This approach of using mathematical equations is effective, but it requires us to specify the relationship between the data variables in the form of an equation.
However, what we truly desire is a method that can automatically discern the relationship within the data without relying on a predefined equation.
Machine Learning Model
A machine learning model is a program that has the ability to identify and discover patterns within data without being manually programmed. Instead, it undergoes a training process where it learns from the data. Let's explore the training process of a machine learning model in more detail.
Training process of a model
The training process of a model involves three essential components:
- The model
- Loss function
- Optimizer
Here's the process: The model takes an input value x and produces a prediction. This prediction is then compared to the actual value using the loss function. The loss function measures the performance of the model during training and returns a loss value. A high loss value indicates poor performance, while a low loss value indicates good performance. The optimizer is used to adjust the model's parameters in order to minimize the loss value.
This process is repeated multiple times, with each iteration referred to as an epoch. The number of epochs is determined by the creator of the model. For example, if the model is trained for 50 epochs, it means the training data is passed through the model, loss function, and optimizer 50 times.
Building a model
To create a model in TensorFlow.js, we can utilize the tf.sequential
function:
const model = tf.sequential();
A model consists of layers, and these layers have sets of values called weights attached to them. The weights are what are adjusted during the training process. Upon creation of the model, the weights are randomly initialized. Data flows from the input through each corresponding layer before reaching the output.
Let's create a layer for our model:
tf.layers.dense({ units: 1, inputShape: [1] })
Here, we have defined what is known as a dense layer. There are different types of layers, but for now, we will focus on dense layers. The dense layer takes in a JavaScript object { units: 1, inputShape: [1] }
, which contains two keys: units
and inputShape
. The units parameter defines the number of weights our layer will have, and in this case, it is set to one.
The weight in a dense layer is analogous to the slope in the ordinary least squares method. It can be any number of our choosing and doesn't have to be limited to one. Additionally, each layer has an associated bias value, which is automatically incorporated by TensorFlow.js. We can add the layer to our model using the add
method provided by the defined model.
The inputShape
key specifies the shape of our input data, particularly when our model takes in a scalar value.
model.add(tf.layers.dense({ units: 1, inputShape: [1] }));
With this, we have defined our model. Here is a diagram illustrating our model:
Let's experiment with different model configurations:
const model = tf.sequential();
model.add(tf.layers.dense({ units: 3, inputShape: [1] }));
The code above creates a model with one layer that takes one value as input. This layer has 3 units and 1 bias term, which is automatically added for us.
Let's go deeper and create a model with two layers.
const model = tf.sequential();
model.add(tf.layers.dense({ units: 3, inputShape: [1] }));
model.add(tf.layers.dense({ units: 2 }));
The second layer has two units and a bias attached to it. Here's the corresponding diagram:
All the units from Layer 1 have a connection to individual units of the next layer, which is why it is called a dense layer.
Let's get back to our task and place our original model configuration into a function for simplicity.
function createModel() {
const model = tf.sequential();
model.add(tf.layers.dense({ units: 1, inputShape: [1] }));
return model;
}
This function returns a model. Currently, the model has randomly initialized weights and is not yet trained. The goal is to train the model to fit our data, and that's where the training process comes in. To train our model, we need two things: a loss function and an optimizer.
Loss function
The loss function, also known as the cost function, is a way for the model to measure its performance. It quantifies the difference between the predicted values and the actual values. TensorFlow.js provides several pre-defined loss functions, but for a better understanding, let's build one from scratch. We will use the mean squared error (MSE) as our loss function.
The mean squared error (MSE) measures the difference between the predicted values and the actual values by squaring the difference to avoid negative values. The mean of these squared differences is then calculated. Here's the TensorFlow.js code equivalent for the MSE loss function:
function loss(predicted, actual) {
return predicted.sub(actual).square().mean();
}
The optiimizer
While the loss function is used to evaluate the model's performance, the optimizer is responsible for updating the weights of the model based on the information provided by the loss function. There are several algorithms that can be used to optimize the weights, and in this case, we will use an algorithm called Stochastic Gradient Descent (SGD). TensorFlow.js provides an implementation of SGD that we can use.
The tf.train.sgd
function implements SGD and takes one argument called the training step. The process of optimizing the weights of a model is a search problem.
The training step determines the size of the jumps taken during the optimization process. A higher training step value may lead to faster convergence to the optimal solution, but there's a risk of overshooting it. On the other hand, a lower training step will take longer but can provide a more thorough search for the optimal solution.
tf.train.sgd(0.1) // SGD with a training step of 0.1
The training step
Now let's put everything together and train our model. First, we need to create our model using the createModel
function we defined earlier. Then, we add the loss function we created and the optimizer to our model using the compile
method.
const model = createModel();
model.compile({ optimizer: tf.train.sgd(0.1), loss });
To train our model, we use the fit
method. This method takes in our inputs and their corresponding outputs, which in our case are the x_data
and y_data
variables. Additionally, we need to provide additional information as an object.
model.fit(x_data, y_data, {
epochs: 50,
callbacks: {
onEpochEnd: (epoch, logs) => {
console.log(`Epoch ${epoch + 1}: Loss = ${logs.loss}`);
},
},
});
The third argument is an object that contains the epochs
and callbacks
keys. The epochs
value determines how many times our dataset will go through the entire training process. The callbacks
value is an object that defines functions to be executed at specific times during the training process. In our case, we have the onEpochEnd callback, which runs at the end of every epoch.
The code above won't work yet because model.fit
is asynchronous we have to await it and place it in an asynchronous function.
async function trainModel() {
const history = await model.fit(x_data, y_data, {
epochs: 50,
callbacks: {
onEpochEnd: (epoch, logs) => {
console.log(`Epoch ${epoch + 1}: Loss = ${logs.loss}`);
},
},
});
The process of using a model after training is called inference. Let's create a function for performing inference with our trained model.
async function inference(){
// start the training process
await trainModel();
// Get the weights of the model
const [ unit, bias ] = model.getWeights();
console.log('Unit:', unit.dataSync()[0]);
console.log('Bias:', bias.dataSync()[0])
// predict a singular value
const y = model.predict(tf.tensor([2]));
console.log(y.dataSync()[0]);
}
This function calls the trainModel
function to initiate the training process. After training, we obtain the trained weights from the model's getWeights
method. Then, we use the model's predict method to make actual predictions on our data values.
The dataSync()
method allows us to get the tensor data values in a synchronous manner. This is a blocking operation, but there's an asynchronous equivalent called data.
Let's summarize everything we have done so far:
const tf = require('@tensorflow/tfjs');
// Define a linear regression model
function createModel() {
const model = tf.sequential();
model.add(tf.layers.dense({ units: 1, inputShape: [1] }));
return model;
}
// Prepare the data
const x_data = tf.linspace(0, 1, 100);
const y_data = tf.add(tf.mul(2, x_data), 1).add(tf.randomNormal([100], 0, 0.1));
// Define the loss function
function loss(predicted, actual) {
return predicted.sub(actual).square().mean();
}
// Compile the model
const model = createModel();
model.compile({ optimizer: tf.train.sgd(0.1), loss });
// Define the function to train the model
async function trainModel() {
const history = await model.fit(x_data, y_data, {
epochs: 50,
callbacks: {
onEpochEnd: (epoch, logs) => {
console.log(`Epoch ${epoch + 1}: Loss = ${logs.loss}`);
},
},
});
}
// Define the inference function
async function inference(){
// start the training process
await trainModel();
// Get the weights of the model
const [ unit, bias ] = model.getWeights();
console.log('Unit:', unit.dataSync()[0]);
console.log('Bias:', bias.dataSync()[0])
// predict a singular value
const y = model.predict(tf.tensor([2]));
console.log(y.dataSync()[0]);
}
// Call the inference function
inference();
The code above should work as it is, and here's the log message after the function has been called.
Epoch 1: Loss = 3.0514166355133057
Epoch 2: Loss = 0.3914150893688202
Epoch 3: Loss = 0.16473960876464844
Epoch 4: Loss = 0.12166999280452728
Epoch 5: Loss = 0.11165788769721985
Epoch 6: Loss = 0.1014869213104248
Epoch 7: Loss = 0.09044851362705231
Epoch 8: Loss = 0.0814569965004921
Epoch 9: Loss = 0.07538492977619171
Epoch 10: Loss = 0.0688215121626854
Epoch 11: Loss = 0.06399987637996674
Epoch 12: Loss = 0.05861040949821472
Epoch 13: Loss = 0.054225534200668335
Epoch 14: Loss = 0.04796575382351875
.
.
.
.
Epoch 40: Loss = 0.012772820889949799
Epoch 41: Loss = 0.01268770918250084
Epoch 42: Loss = 0.012648873031139374
Epoch 43: Loss = 0.012167338281869888
Epoch 44: Loss = 0.012036673724651337
Epoch 45: Loss = 0.011740590445697308
Epoch 46: Loss = 0.011509907431900501
Epoch 47: Loss = 0.01139025203883648
Epoch 48: Loss = 0.011547082103788853
Epoch 49: Loss = 0.011259474791586399
Epoch 50: Loss = 0.011122853495180607
Unit: 1.9403504133224487
Bias: 1.0415139198303223
4.922214508056641
From the log message, we can observe that during the first epoch, our loss value is high, but it subsequently reduces. This indicates that our model is improving with each epoch.
By using the model's getWeights
method, we obtained the values of the unit and bias as 1.9403504133224487 and 1.0415139198303223 respectively. These values are approximately the same as the ones we obtained using the OLS method. When we provided the value of 2 to our model, it predicted 4.922214508056641.
Comparing both methods
The goal of linear regression is to find a straight line that bests represents our data. Our original equation was . The ML model came up with an equation of:
slope = 1.9403504133224487 // our unit's value from the model's weight
intercept = 1.0415139198303223 // our bias's value from the model's weight
x = 2 // input value
y = x * slope + intercept
console.log(y) // 4.92221474647522
The solution using the Ordinary Least Squares (OLS) method can be expressed as:
slope = 2.02381443977355967
intercept = 0.980187177658081
x = 2 // input value
y = x * slope + intercept
console.log(y) // 5.0278160572052
Both of them are converging towards a slope of 2 and an intercept of 1.
When comparing the Ordinary Least Squares (OLS) method to the ML model, it is important to consider their generality. The OLS method provides a specific solution for linear regression and assumes a linear relationship between the variables. If the data doesn't follow a linear pattern, the OLS method may not yield accurate results, and finding a different equation or using alternative regression techniques becomes necessary.
On the other hand, the ML model offers more flexibility and generalizability. By adjusting the model's configuration, such as adding more layers, we can potentially capture more complex patterns in the data beyond simple linear relationships. This adaptability makes the ML model well-suited for handling a wider range of data patterns and potentially achieving better performance in different scenarios.
In this article, we were introduced to the concept of a machine learning model, specifically the linear model. However, it's important to note that there are other types of models available as well. One such example is the polynomial model, which is used when the data does not follow a straight line trend but exhibits a more complex pattern. Additionally, there are models specifically designed for natural language processing tasks, such as language models, which aim to understand and generate human language by capturing its underlying patterns and relationships. These different types of models allow us to tackle a wide range of problems and data patterns effectively.
All models, regardless of their specific architecture or purpose, share a common structure consisting of layers and weights. This allows them to process and learn from data in a structured manner. For instance, a language model like GPT-3 (Generative Pre-trained Transformer 3) is referred to as "large" due to its extensive scale. GPT-3 is composed of 96 layers and a staggering 175 billion weights, which enables it to capture complex patterns and relationships in language data. The size and complexity of models like GPT-3 contribute to their impressive performance on various natural language processing tasks.
Useful resources
- Tutorial by TensorFlow that walks you through building a model using TensorFlow.js
- A Guide that introduces the model and layers in TensorFlow.js.
- Video introducing the concept of regression by the YouTube channel crash course(statistics).
- Video explaining the Ordinary Least Sqaure(OLS) method by YouTuber Organic Chemistry Tutor.
Top comments (0)