Natural Processing Language(NLP) is the process by which a machine tries to understand, interpret, and generate meaningful human language. The text-generating model that we used in the form of ChatGPT uses this model to process the language as machines cannot understand direct human language. The rise of NLP is surely going to rise in the future to generate a much better response.
In this, there is various principle that deals runs behind such as Tokenization, Parsing, Semantic Role Labeling, etc. One of these is Word embedding. This is a technique that represents words as multidimensional vectors in continuous vector space. This embedding is very useful in enhancing the working of any NLP model.
So, today we are going to learn more about embedding. The topics that we are going to discuss are:
- What is Embedding?
- Use case of Embeddings
- Generating Embedding in Supabase’s edge function
- Storing embedding in Supabase as pgvector
I hope this excites you in learning more about word embeddings.
Word Embedding
As we discuss in the introduction it is a technique that is used to represent words in multi-dimensional vector format. Each word is given a numeric value in terms of vectors. These vectors capture the relationship between words based on their context in the large corpus of text. Embeddings have become a fundamental part of any NLP model. It helps in enhancing the performance while running queries.
Let’s take an example of embedding to understand it better. Here are three sentences:
- "The cat chased the mouse."
- "The dog barked at the cat."
- "The mouse ran away."
A two-dimensional vector embedding can be created with the following output:
- "cat" -> [0.9, 0.2]
- "dog" -> [0.8, 0.3]
- "mouse" -> [0.1, 0.9]
- "chased" -> [0.7, 0.4]
- "barked" -> [0.6, 0.5]
- "ran" -> [0.2, 0.8]
- "the" -> [0.0, 0.0] common words like "the" often have near-zero vectors
Use Case of Embedding
There are various use cases for embedding few of them are:
- Semantic Understanding: As the words in embedding encode in the relation between the words. Similar words have vectors that are close in embedding space. This semantic understanding will be useful in tasks like synonym identification, and word analogy.
- SentimentAnalysis: It can also be useful in understanding the sentiments of the words. The tone in which the words are written. Helpful in identifying the emotional value of a sentence. It can also help in improving the tone of the given sentence as per the requirement like in Grammarly.
- Language Translation: With embedding, translating words from one language to another can be done easily. It can give similar words in different languages with similar vector values.
- Information Retrieval: Word embeddings improve search engines by understanding the semantic similarity between user queries and documents in a corpus. It will be helpful in retrieving useful data from the large data set for further processing.
Word embedding can be further used in various fields to process the text document. With fine-tuning the model it can give better results. There are various models that are pre-trained on data to provide embeddings. These platforms can be Openai embedding, transformer.js embedding, FastText, etc.
Generating Embedding using Supabase
Supabase is a powerful backend-as-a-service platform that allows developers to easily build scalable web applications with serverless functions and a PostgreSQL database. Recently they introduced the support of transformers.js. It is designed to be functionally equivalent to Hugging Face’s transformers Python library for nodeJS and deno.
This support of transformers.js is available in the edge function of supabase. Edge functions are implemented on the basis of FasS to run functions to perform tasks. This helps in achieving the serverless architecture. These functions run in a demo environment. We can perform a variety of tasks using this edge function. You can learn more about the edge function from here.
We are going to build a React application that sends requests to the edge function to convert given text into embedding. This embedding then gets stored in the supabase database. So, let’s get started.
Building the react application
We are going to use CRA for using React. You can use also use other React frameworks. There won’t be much difference.
Install React with the following command:
npx create-react-app embeddings
Note: To use the above command and further commands, you need to have nodejs pre-installed.
Clean up the unnecessary code. Now, it’s time to install the necessary libraries. Here are those:
- @supabase/supabase-js: JavaScript library for handling requests to Supabase.
npm i @supabase/supabase-js
Adding a project to Supabase
First, let’s set up our project on the supabase dashboard. Visit supabase.com then click on Start your project
from the right of the nav bar. This will take you to the sign-in page. Enter your details for sign-in or sign-up as you require. After logging in, you will be redirected to the project page. From here, click on New Project
to create a project. On the Create New Project page, you will be asked to enter details of your project.
Fill in the details of your project. Enter the project’s name as per your choice. For passwords, you can use the Generate a password
for generating password. Select the region that is nearest to the user of your project. In the free tier, you can create two projects. After filling in the details, click on Create new project
. It will take a few minutes to set up the project.
App.js
Here is the code for the App.js. Each step has comments to explain it.
import "./App.css";
import { useState } from "react";
import { createClient } from "@supabase/supabase-js";
// Create a single supabase client for interacting with your database
const supabase = createClient(
SUPABASE_URL,
SUPABASE_ANON_KEY,
);
function App() {
// storing data in state
const [inputData, setInputData] = useState(null);
const [embedData, setembedData] = useState(null);
//making call to edge function to create embeddings
const handleEmbed = async () => {
const { data, error } = await supabase.functions.invoke(
"create-embeddings",
{
body: { input: inputData },
}
);
if (error) {
console.log(error);
} else {
setembedData("Embed successfully stored");
}
};
return (
<div className="App">
<h1>Generate Embeddings</h1>
<input type="text" onChange={(e) => setInputData(e.target.value)} />
<br />
<button onClick={handleEmbed}>Run Embed</button>
{embedData && <p>{embedData}</p>}
</div>
);
}
export default App;
SUPABASE_URL
and SUPABASE_ANON_KEY
are secrets that can be found in the Supabase App dashboard. Go to Project Setting → API, you will find it there.
Edge Function
As in the App.js, we are making a call to the edge function. Now it’s time to create the edge function. Before writing code for the edge function, we need to install the Supabase CLI for managing the creating, running, and deploying of the edge function. It is easy to install the CLI. Follow the below steps to install it.
- Run the below command to install:
npm i supabase --save-dev
Note: Node and NPM should be pre-installed to run the command.
- Now, we need to log in to the subbase in the CLI. To run any supabase command just prefix it with npx supabase then command. So for login here it is:
npx supabase login
This will ask for the access token. You can generate an access token from here for your project. Enter that token in the asked input.
- Now, let's go to the project directory where you want to write code for your function. In the root directory, run the below command to initialize the supabase.
npx supabase init
- After this at last we just need to provide the reference URL of your project from supabase. Here is the command for it:
npx supabase link --project-ref your-project-ref
Change the to your project reference. You can get the project reference URL from the Supabase's setting→ API.
Writing the Edge Function for creating Embedding
For writing the function, first, we need to create a file for writing the function. You can create the function with the Supabase CLI. Here are the commands:
npx supabase functions new create-embeddings
This will create a file in the supabase→function→create-embeddings with the index.ts. You can write your function in the index.ts file. Below is the code for the create-embeddings
.
import { serve } from "https://deno.land/std@0.168.0/http/server.ts";
import {
env,
pipeline,
} from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.5.0";
import { supabaseClient } from "../_shared/apiClient.ts";
import { corsHeaders } from "../_shared/cors.ts";
// Configuration for Deno runtime
env.useBrowserCache = false;
env.allowLocalModels = false;
const pipe = await pipeline("feature-extraction", "Supabase/gte-small");
serve(async (req) => {
// This is needed if you're planning to invoke your function from a browser.
if (req.method === "OPTIONS") {
return new Response("ok", { headers: corsHeaders });
}
try {
// Extract input string from JSON body
const { input } = await req.json();
// Generate the embedding from the user input
const output = await pipe(input, {
pooling: "mean",
normalize: true,
});
// Extract the embedding output
const embedding = Array.from(output.data);
const { data, error } = await supabaseClient
.from("documents")
.insert({ text: input, embedding: embedding });
if (error) {
throw error;
}
return new Response(JSON.stringify("Vector stored Successfully!"), {
headers: { ...corsHeaders, "Content-Type": "application/json" },
status: 200,
});
} catch (error) {
return new Response(JSON.stringify({ error: error.message }), {
headers: { ...corsHeaders, "Content-Type": "application/json" },
status: 400,
});
}
});
Supabase uses the Transformer.js for creating the embeddings. The below code form the app did the generating the embeddings and extracting the output:
// Generate the embedding from the user input
const output = await pipe(input, {
pooling: "mean",
normalize: true,
});
// Extract the embedding output
const embedding = Array.from(output.data);
After this, we are storing the original text and embeddings into the supabase database with the table name documents. Embedding cannot be stored in plain text. It requires a special data type vector with a parameter as the dimension of the vector. Run the below command in the SQL Editor from the supabase project dashboard to create a table to store database.
-- Enable the pgvector extension to work with embedding vectors
create extension vector;
-- Create a table to store your documents
create table posts (
id serial primary key,
title text not null,
embedding vector(384)
);
In the code, you can find the two imports i.e, CORS and supabaseClient. CORS is for allowing invoking of the function from the browser and supabaseClient is for using supabase functionality in the edge function. Both these are stored in the _shared
directory in the function. Here are the code for both of this
cors.ts
export const corsHeaders = {
"Access-Control-Allow-Origin": "*",
"Access-Control-Allow-Headers":
"authorization, x-client-info, apikey, content-type",
};
apiClient
import { createClient } from "https://esm.sh/@supabase/supabase-js@2";
export const supabaseClient = await createClient(
Deno.env.get("SUPABASE_URL") ?? "",
Deno.env.get("SUPABASE_ANON_KEY") ?? ""
);
You can directly get the Supabase secret directly in the deno environment for edge functions such as URL and ANON_KEY.
After writing the edge function makes sure to deploy it to the supabase with the below code:
npx supabase functions deploy
Testing the Application
If you run the react application and there are no error then you will find the below output screen:
Now, if you enter some input in the text field and click the Run Embed
button then it will invoke the edge function. On successful, storing the embed it will display the message Embed successfully stored
as you can see in the below output.
You can check the embedding from the supabase dashboard. Navigate to Table Editor → name of the table. You can find the embedding there.
Things to add to the project
I would love to create more content on embedding in using it for performing different tasks. Meanwhile, you can add more to the project by adding these functionalities:
- Running query search to extract useful information from the embedding
- Making GPT call on the retrieval data
I will try to cover these topics in the upcoming articles. So, make sure to follow the blog for further content.
Conclusion
Word embeddings are a powerful tool in natural language processing (NLP) that helps machines understand and work with human language more effectively. They encode semantic relationships between words, enabling various NLP tasks such as sentiment analysis, language translation, and information retrieval. Supabase's edge functions in conjunction with the Transformer.js library to generate and store embeddings, which can be a valuable addition to any NLP project.
With the use of Supabase, you can create a serverless application with ease as they provide the tool to achieve it with an Auth, database, storage, and edge functions for FaaS architecture.
I hope this article has helped you in learning embeddings and a method to generate embedding. Thanks for reading the article.
Top comments (0)