DEV Community

Cover image for Blazing Fast Search with text-embedding-ada-002 in Node.js 🚀
Raja Osama
Raja Osama

Posted on

Blazing Fast Search with text-embedding-ada-002 in Node.js 🚀

Do you want to create amazing applications that can search, cluster, recommend, and classify text and code? Do you want to use a powerful and cost-effective embedding model that can handle long documents and diverse tasks? Do you want to learn how to use text-embedding-ada-002 in Node.js with ease and simplicity? If you answered yes to any of these questions, then this article is for you! 😍

In this article, we will show you how to use text-embedding-ada-002, a new and improved embedding model from OpenAI, in Node.js. We will explain what embeddings are, why they are useful, and how text-embedding-ada-002 outperforms previous models. We will also show you how to install the OpenAI library, and make requests to the /createEmbedding endpoint. Finally, we will give you one examples of how to use text-embedding-ada-002 for different use cases, such as search.

Complete Article can be found here https://rajaosama.me/blogs/blazing-fast-search-with-embedding-model-ada

What are embeddings? 🤔

Embeddings are numerical representations of concepts converted to number sequences, which make it easy for computers to understand the relationships between those concepts. For example, the word "cat" can be represented by a vector of numbers, such as [0.2, -0.5, 0.7 ...]. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

Embeddings are commonly used for:

  • Search (where results are ranked by relevance to a query string)
  • Clustering (where text strings are grouped by similarity)
  • Recommendations (where items with related text strings are recommended)
  • Anomaly detection (where outliers with little relatedness are identified)
  • Diversity measurement (where similarity distributions are analyzed)
  • Classification (where text strings are classified by their most similar label)

Why text-embedding-ada-002? 🙌

text-embedding-ada-002 is a new embedding model from OpenAI that replaces five separate models for text search, text similarity, and code search. It outperforms the previous most capable model, Davinci, at most tasks, while being priced 99.8% lower¹. Here are some of the advantages of text-embedding-ada-002:

Embedding

  • Stronger performance. text-embedding-ada-002 outperforms all the old embedding models on text search, code search, and sentence similarity tasks and gets comparable performance on text classification¹.
  • Unification of capabilities. text-embedding-ada-002 simplifies the interface of the /createEmbedding endpoint by merging the five separate models into a single new model. This single representation performs better than the previous embedding models across a diverse set of text search, sentence similarity, and code search benchmarks¹.
  • Longer context. The context length of text-embedding-ada-002 is increased by a factor of four, from 2048 to 8192 tokens¹, making it more convenient to work with long documents.
  • Smaller embedding size. The new embeddings have only 1536 dimensions¹, one-eighth the size of davinci embeddings², making them more cost effective in working with vector databases.
  • Reduced price. The price of text-embedding-ada-002 is reduced by 90% compared to old models of the same size². The new model achieves better or similar performance as Davinci at a 99.8% lower price¹.

Overall, text-embedding-ada-002 is a much more powerful tool for natural language processing and code tasks. It can help you create even more capable applications in your respective fields.

How to set up Node.js? 💻

Node.js is an open-source and cross-platform JavaScript runtime environment⁵ that allows you to run JavaScript code outside the browser. To use text-embedding-ada-002 in Node.js, you need to set up your Node.js environment first.

To install Node.js on your machine, follow these steps:

  1. Go to the official Node.js website⁶ and download the installer for your operating system.
  2. Run the installer and follow the instructions on the screen.
  3. To verify that Node.js is installed correctly, open a terminal or command prompt window and type node -v. You should see something like v18.16.0 printed on the screen.

How to install OpenAI API library? 📚

The OpenAI library is a wrapper around the OpenAI API that makes it easy to use in different programming languages. To use text-embedding-ada-002 in Node.js, you need to install the OpenAI API library for Node.js.

To install the OpenAI API library for Node.js using npm (Node Package Manager), follow these steps:

  1. Open a terminal or command prompt window and navigate to your project directory.
  2. Type npm install openai and press enter.

How to make requests to /createEmbedding endpoint? 📨

To get an embedding from text-embedding-ada-002 in Node.js using the OpenAI API library,
follow these steps:

  1. Import the OpenAI API library by typing const openai = require('openai'); at the top of your JavaScript file.
  2. Create an instance of the OpenAI class by typing const OPENAI_API_KEY = 'sk-your-api-key'; and const OPENAI_API_KEY = process.env.OPENAI_API_KEY; const configuration = new Configuration({ apiKey: OPENAI_API_KEY, }); const openai = new OpenAIApi(configuration); where you replace your-api-key with your actual API key that you can get from your account page⁷.
  3. Create an input object that contains your input string and the model ID by typing const input = {input: 'Your input string goes here', model: 'text-embedding-ada-002'};.
  4. Call the createEmbedding method of the OpenAI class by typing openai.createEmbedding(input).then(response => { // do something with response });. The response object will contain an embedding array that you can extract, save, and use.

How to use text-embedding-ada-002 for different use cases? 🚀

Now that you know how to get an embedding from text-
embedding-
ada-
002 in Node.js,
let's see some examples of how to use it for different use cases.

Complete Article can be found here https://rajaosama.me/blogs/blazing-fast-search-with-embedding-model-ada

Search 🔎

One common use case for embeddings is search,
where you want to rank results by relevance to a query string.
For example,
suppose you have a collection of documents about animals,
and you want to find the most relevant ones for a given query.

To do this,
you can follow these steps:

  1. Get embeddings for all your documents using text- embedding- ada- 002 and store them in a vector database, such as Elasticsearch⁸.
  2. Get an embedding for your query string using text- embedding- ada- 002 and send it to your vector database.
  3. Retrieve the documents with the smallest distances (or highest similarities) to your query embedding and display them as your search results.

Here is some pseudocode that illustrates this process:

// Import libraries
const { Configuration, OpenAIApi } = require("openai");
const { Client } = require("@elastic/elasticsearch");
const client = new Client({
  node: "http://localhost:9200",
  auth: {
    username: "elastic",
    password: process.env.ELASTIC_SEARCH,
  },
});

// Test the connection
client.ping((err, res) => {
  if (err) {
    console.error("Connection failed:", err);
  } else {
    console.log("Connection successful:", res);
  }
});

// Create instances
const OPENAI_API_KEY = process.env.OPENAI_API_KEY;
const configuration = new Configuration({
  apiKey: OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);

const createIndices = async () => {
  await client.indices.create({
    index: "pets",
    body: {
      mappings: {
        properties: {
          // The document field stores the text of the animal
          document: {
            type: "text",
          },
          // The embedding field stores the vector representation of the animal
          embedding: {
            type: "dense_vector",
            dims: 1536,
            index: true,
            similarity: "cosine",
          },
        },
      },
    },
  });
};

const createIndexs = () => {
  // // Get embeddings for documents
  const documents = [
    "A cat is a domesticated animal that likes to sleep.",
    "A dog is a loyal companion that likes to play.",
    "A bird is a feathered creature that likes to fly.",
    "A fish is an aquatic animal that likes to swim.",
    "A lion is a wild animal that likes to hunt.",
  ]; // sample documents
  documents.forEach((document, index) => {
    const input = { input: document, model: "text-embedding-ada-002" };
    openai
      .createEmbedding(input)
      .then((response) => {
        const embedding = response.data.data[0].embedding; // extract embedding array
        client.index({
          // store document and embedding in Elasticsearch
          index: "pets",
          //   type: 'document',
          //   id: index,
          body: {
            document: document,
            embedding: embedding,
          },
        });
      })
      .catch((e) => {
        throw new Error("Failed once, dont try again");
        console.log(e.message);
      });
  });
};

const query = () => {
  // Get embedding for query
  const query = "What animal likes water?"; // sample query
  const input = { input: query, model: "text-embedding-ada-002" };
  openai.createEmbedding(input).then((response) => {
    const query_embedding = response.data.data[0].embedding; // extract query embedding array

    // Search documents by query embedding
    client
      .search({
        index: "pets",
        body: {
          knn: {
            field: "embedding",
            query_vector: query_embedding,
            k: 5, // Return the top 3 nearest neighbors
            "num_candidates": 10
          },
        },
      })
      .then((result) => {
        const hits = result.hits.hits; // get the matched documents
        hits.forEach((hit) => {
          console.log(hit._source.document); // print the document text
          console.log(hit._score); // print the similarity score
        });
      });
  });
};

const main = () => {
  createIndices();
  createIndexs()
  query()
};

main();
Enter fullscreen mode Exit fullscreen mode

The output of this code might look something like this:

A fish is an aquatic animal that likes to swim.
1.9999999
A bird is a feathered creature that likes to fly.
1.0000001
A dog is a loyal companion that likes to play.
0.99999994
A cat is a domesticated animal that likes to sleep.
0.9999999
A lion is a wild animal that likes to hunt.
0.9999998
Enter fullscreen mode Exit fullscreen mode

As you can see, the fish document is the most relevant one for the query, followed by the bird document. The other documents have lower similarity scores and are less relevant.

More Examples and usecases of using text-embedding-ada-002 can be found here https://rajaosama.me/blogs/blazing-fast-search-with-embedding-model-ada

Thank you.

Top comments (0)