Chris McKenzie

Posted on Sep 21, 2023

Run Models in the Browser With Transformers.js

#javascript #tutorial #ai

For this tutorial we’ll be using transformers.js from Hugging Face. If you’re not familiar with Hugging Face, I highly recommend you check them out. They’re doing some really cool stuff in the AI space. The TLDR is that they’re the GitHub of ML.

Transformers.js

Based on the Python library — transformers — transformers.js is a JavaScript library that allows you to use pretrained models from Hugging Face in a JavaScript environment — such as the browser. While it’s not as robust as the Python library, it’s still pretty powerful.

The library is versatile, supporting tasks such as Natural Language Processing (NLP), Computer Vision, Audio, and Multimodal processing.

Under the hood, Transformers.js uses the ONNX Runtime to run the models in the browser.

ONNX Runtime

ONNX Runtime is a cross-platform machine-learning model accelerator, with a flexible interface to integrate hardware-specific libraries. ONNX Runtime can be used with models from PyTorch, Tensorflow/Keras, TFLite, scikit-learn, and other frameworks.

ONNX, or Open Neural Network Exchange, was initially started as a collaboration between Microsoft and Facebook. The project is open-source, and has grown to offer a universal format for machine learning models, ensuring seamless sharing and deployment across platforms.

ONNX is what makes it possible to directly run models in a browser. This is pretty insane!

Build a Demo

For this tutorial let’s build a simple app that summarizes text. Let’s start by setting up our environment. For this demo we’ll keep it as simple as possible, but you can use any framework you’d like.

Complete project can be found on Github

Setup project foundation

mkdir transformersjs-demo  
cd transformersjs-demo  
touch index.html styles.css main.js

Now let’s create a simple HTML page.

<!DOCTYPE html>  
<html lang="en">  
<head>  
    <meta charset="UTF-8">  
    <meta name="viewport" content="width=device-width, initial-scale=1.0">  
    <title>Summary Generator</title>  
    <link rel="stylesheet" href="styles.css">  
    <script src="main.js" type="module" defer></script>  
</head>  
<body>  
    <div class="container">  
        <textarea id="long-text-input" placeholder="Enter your copy here..."></textarea>  
        <button id="generate-button" disabled>  
          <span id="spinner">🔄</span> Generate Summary  
        </button>  
        <div id="output-div"></div>  
    </div>  
</body>  
</html>

Add a little CSS to make it look nice.

* {  
  box-sizing: border-box;  
}  

body {  
  font-family: Arial, sans-serif;  
  background-color: #f4f4f4;  
  display: flex;  
  justify-content: center;  
  align-items: center;  
  height: 100vh;  
  margin: 0;  
  padding: 20px;  
}  

.container {  
  background-color: #ffffff;  
  border-radius: 10px;  
  padding: 30px;  
  box-shadow: 0px 0px 15px rgba(0, 0, 0, 0.1);  
  width: 80%;  
  max-width: 600px;  
}  

textarea {  
  width: 100%;  
  height: 200px;  
  border-radius: 5px;  
  padding: 15px;  
  font-size: 16px;  
  border: 1px solid #ddd;  
  resize: none;  
}  

button {  
  display: block;  
  width: 100%;  
  margin: 20px 0;  
  padding: 10px;  
  background-color: #3498db;  
  color: #ffffff;  
  border: none;  
  border-radius: 5px;  
  cursor: pointer;  
  font-size: 18px;  
}  

button:hover {  
  background-color: #2980b9;  
}  

button:disabled {  
  background-color: #b3c2c8;  
  cursor: not-allowed;  
}  

@keyframes spin {  
  0% { transform: rotate(0deg); }  
  100% { transform: rotate(360deg); }  
}  

#spinner {  
  display: none;  
  margin-right: 10px;  
  animation: spin 1s linear infinite;  
}  

#spinner.show {  
  display: inline-block;  
}  

#output-div {  
  display: none;  
  background-color: #f9f9f9;  
  border: 1px solid #ddd;  
  padding: 15px;  
  border-radius: 5px;  
  font-size: 16px;  
}

Selecting a Model

You should take some time to understand the different models, what they’re used for, and their tradeoffs. When picking a model, it’s important to not just consider the task, but also consider size, speed and accuracy. For this I recommend you check out the model hub on Hugging Face. You can convert your models to run in an ONNX runtime, but that’s outside the scope of this tutorial, however, if you’re curious about that checkout Convert your models to ONNX

For this demo we’ll be using the Xenova/t5-small model, which is a small model that can be used for summarization. This model is base off of [t5-small](https://huggingface.co/t5-small) built with ONNX weights to be compatible with transformers.js.

Adding JavaScript

Now let’s add some JavaScript to make it work. The first thing we want to do is import transformers.js.

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.3.0';

We’re using modern JavaScript we can import it directly from a CDN in the browser, but you can also npm install it.

Next we’ll use the pipeline function to create a summarization pipeline:

const summarization = await pipeline(  
    'summarization', // task  
    'Xenova/t5-small' // model  
);

The pipeline function takes two arguments, the task and the model. We’re using the summarization task and the Xenova/t5-small model, but there are many other permutations you can use. For example, you could use the question-answering task and the deepset/roberta-base-squad2 model to create a question answering pipeline. Checkout the Tasks section for all the different tasks you can use.

So far we have a nice UI and we’ve created a summarization pipeline. Now let’s add some code to make it work. Below the import but above the pipeline function, add the following code:

const longTextInput = document.getElementById('long-text-input');  
const generateButton = document.getElementById('generate-button');  
const output = document.getElementById('output-div');  
const spinner = document.getElementById('spinner');

Next we’ll want to add some code to kick off the summarization task when the super clicks the “Generate Summary” button. Below the pipeline function, add the following code:

generateButton.addEventListener('click', async () => {  
    const input = longTextInput.value;  
    const result = await summarization(input);  
    output.innerHTML = result[0].summary_text;  
});

Ideally, we don’t want our “Generate Summary” button to be clickable until the model is loaded. Depending on the model you pick, and the internet speed, this could take anywhere from a few seconds to over a minute. So let’s add a little code to disable the button until the model is loaded.

Update the button html to include the disabled attribute.

<button id="generate-button" disabled>

Next we’ll want to enable the button once the model is loaded: Below the pipeline function, add the following code:

generateButton.removeAttribute('disabled');

Next we’ll add a simple loading spinner:

generateButton.addEventListener('click', async () => {  
    spinner.classList.add('show');  
    generateButton.setAttribute("disabled", true);  

    const input = longTextInput.value;  
    const result = await summarization(input);  

    output.innerHTML = result[0].summary_text;  
    spinner.classList.remove('show');  
    generateButton.removeAttribute("disabled");  
    output.style.display = 'block';  
});

Testing It Out

For this demo, I’m going to summarize the article ‘Dr. Google’ meets its match in Dr. ChatGPT.

As you can see, we get a summary of the article, but it’s too short. Let’s update the code to get a longer summary. Add an options object to the pipeline function:

const result = await summarization(input, {  
    min_length: 50, max_length: 250,  
});

Running it again we see we get a longer summary. But it’s still not perfect. The accuracy leaves something to be desired. Let’s try a different model. Let’s try the t5-base model. Replace the pipeline with:

const summarization = await pipeline('summarization', 'Xenova/t5-base');

This is a little better, but ultimately you’ll need to decide which model is best for your use case.

Personally, I found Xenova/bart-large-cnn and Xenova/distilbart-cnn-6–6 produced the best results, but were the slowest and required downloading over a GB of data. This is something to keep in mind when selecting a model. You’ll want to balance accuracy with speed and size.

const summarization = await pipeline('summarization', 'Xenova/distilbart-cnn-6-6');  

// or  

const summarization = await pipeline('summarization', 'Xenova/bart-large-cnn');

Final Thoughts

There’s a lot of exciting stuff happening right now in the ML space. And the fact that we can run these models in the browser is pretty insane. It’s not perfect, and there are some considerations you need to take into account:

It’s likely going to be slower than a server-side solution
Because the size of the model matters, small models are generally less accurate
Balancing the tradeoffs between speed, size and accuracy means test different models to find the right one for your use case
You’ll need to consider the security implications of running these models in the browser

Overall this is a pretty exciting development. And it’s not just the browser. You can run these models in Node.js, Deno, React Native, and even in a serverless environment like Cloudflare Workers.

I’m excited to see what people build with this technology.