Faithful Olaleru

Posted on Apr 25, 2023 • Edited on May 12, 2023

Reverse Engineering a Chatbot Project: Exploring the Integration of ChatGPT, LangChain.js, and Pinecone

#chatgpt #pinecone #langchain #nextjs

TABLE OF CONTENT

⚙️ Introduction
🧰 What You Need
🛠 Setup
📲 Ingest Data
⛓ Creating Chains
📟 Server-sent Events
📋 Conclusion
🔗 Helpful Links

⚙️ INTRODUCTION

The use of artificial intelligence (AI) and natural language processing (NLP) has revolutionized the way we interact with technology. Chatbots, in particular, have become a popular tool for businesses to provide customer support, answer inquiries, and even generate leads. However, building a chatbot that can understand natural language and respond appropriately is a complex task.

In this article, we will reverse engineer a chatbot application that was originally built to give legal advise by building our own chatbot that would do something else e.g Personal Assistant. The original project was built with ChatGPT, a state-of-the-art large language model (LLM) developed by OpenAI, LangChain.js, a JavaScript library for NLP tasks, and Pinecone, a powerful machine learning serving system. While reverse engineering this project, we will also uncover the inner workings of how these technologies interact, shed light on their capabilities, and explore the possibilities of leveraging their combined potential. So, let's dive!

🧰 WHAT YOU NEED

Npm & node installed on your PC
OpenAI API Key
Pinecone API Key, Environment name & Index name

🛠 SETUP

We will use Next.js just like the original project. So create a Next app with Typescript & Tailwind. If you not sure how to, check here.

Next we edit our tsconfig.json file so it looks just like below.

Don’t forget to set the target to “es2020”.

Now let’s install our node dependencies and dev-dependencies:

Langchain
@pinecone-database/pinecone
Dotenv
Pdf-parse
Tsx

Let me now describe the process. Based on our data set, we want to build a bot that can provide precise and dynamic responses. For this project, ChatGPT would employ pdf documents as the source of data for its training. We submit our data from the pdfs to pinecone, where it is kept as a vector list of floating point values. When we ask a query of the chatbot, ChatGPT asks Pinecone to return information that is most related to our inquiry. It then streams a response to us, and the cycle continues.

📲 INGEST DATA

We begin by ingesting our data to be accessible by our LLM. Create ingest-data.ts file in /src directory. Use DirectoryLoader from LangChain to get all pdfs in a directory (in our case its docs directory). PDFLoader would only get one pdf file at a time, but with DirectoryLoader, we can pick all pdfs at once. We convert them to raw documents where the pageContent is equal to the texts from the pdf. Then break the bulk texts into smaller chunks and then upload to Pinecone using OpenAIEmbeddings. OpenAIEmbedding is a model that converts text to numerical form. Your file should look like below:

import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { OpenAIEmbeddings } from "langchain/embeddings";
import { PineconeStore } from "langchain/vectorstores";
import { pinecone } from "@/utils/pinecone-client";
import { CustomPDFLoader } from "@/utils/customPDFLoader";
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from "@/config/pinecone";
import { DirectoryLoader } from "langchain/document_loaders";

/* Name of directory to retrieve your files from */
const filePath = "src/docs";

export const run = async () => {
    try {
        /*load raw docs from the all files in the directory */
        const directoryLoader = new DirectoryLoader(filePath, {
            ".pdf": (path) => new CustomPDFLoader(path),
        });

        // const loader = new PDFLoader(filePath);
        const rawDocs = await directoryLoader.load();

        /* Split text into chunks */
        const textSplitter = new RecursiveCharacterTextSplitter({
            chunkSize: 1000,
            chunkOverlap: 200,
        });

        const docs = await textSplitter.splitDocuments(rawDocs);
        console.log("split docs", docs);

        console.log("creating vector store...");
        /*create and store the embeddings in the vectorStore*/
        const embeddings = new OpenAIEmbeddings();
        const index = pinecone.Index(PINECONE_INDEX_NAME); //change to your own index name

        //embed the PDF documents
        await PineconeStore.fromDocuments(docs, embeddings, {
            pineconeIndex: index,
            namespace: PINECONE_NAME_SPACE,
            textKey: "text",
        });
    } catch (error) {
        console.log("error", error);
        throw new Error("Failed to ingest your data");
    }
};

(async () => {
    await run();
    console.log("ingestion complete");
})();

Install pdf-parse dependent at this time if you haven't already, as it will read the contents of all of our pdf files. Check to see whether it works by importing normally. Use the workaround method from customPDFLoader.ts, which involves using the exact _pdf-parse.js file in node_modules, if you encounter issues with pdf-parse like we did. If you opt to utilize pdf-parse.js, remember to implement your module declaration.

Create a file named pdf-parse.d.ts that looks just like below:

declare module 'pdf-parse/lib/pdf-parse.js' {
  import pdf from 'pdf-parse';

  export default pdf;
}

You can also try another pdf parser if you so wish. All you have to do is replace CustomPDFLoader with your new implementation.

⛓ CREATING CHAINS

Then, beginning with an LLM chain, we build our chains. A chain is a sequence of primitives, including other chains or LLM Chains. An LLM chain is a special type of chain that has an LLM and a Prompt Template. Think of a prompt as the final message that gets sent to the LLM. To create the prompt, a Prompt template formats user input. Here is an illustration of an LLM chain:

import { OpenAI } from "langchain/llms/openai";
import { PromptTemplate } from "langchain/prompts";
import { LLMChain } from "langchain/chains";

const model = new OpenAI({ temperature: 0.9 });
const template = "Ask me any question about {X}?";
const prompt = new PromptTemplate({
  template: template,
  inputVariables: ["X"],
});

const chain = new LLMChain({ llm: model, prompt: prompt });

Create a file for our chains, makechain.ts. It'll look like below:

import { OpenAIChat } from "langchain/llms";
import { LLMChain, ChatVectorDBQAChain, loadQAChain, ConversationalRetrievalQAChain } from "langchain/chains";
import { PineconeStore } from "langchain/vectorstores";
import { PromptTemplate } from "langchain/prompts";
import { CallbackManager } from "langchain/callbacks";

const CONDENSE_PROMPT =
    PromptTemplate.fromTemplate(`Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:`);

const QA_PROMPT = PromptTemplate.fromTemplate(
  `You are an AI assistant providing helpful advice. You are given the following extracted parts of a long document and a question. Provide a conversational answer based on the context provided.
You should only provide hyperlinks that reference the context below. Do NOT make up hyperlinks.
If you can't find the answer in the context below, just say "Hmm, I'm not sure." Don't try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.

Question: {question}
=========
{context}
=========
Answer in Markdown:`,
);

export const makeChain = (
    vectorstore: PineconeStore,
    onTokenStream?: (token: string) => void
) => {
    const questionGenerator = new LLMChain({
        llm: new OpenAIChat({ temperature: 0 }),
        prompt: CONDENSE_PROMPT,
    });
    const docChain = loadQAChain(
        new OpenAIChat({
            temperature: 0,
            modelName: "gpt-3.5-turbo", //change this to older versions (e.g. gpt-3.5-turbo) if you don't have access to gpt-4
            streaming: Boolean(onTokenStream),
            callbackManager: onTokenStream
                ? CallbackManager.fromHandlers({
                      async handleLLMNewToken(token) {
                          onTokenStream(token);
                          console.log(token);
                      },
                  })
                : undefined,
        }),
        { prompt: QA_PROMPT }
    );

    return new ChatVectorDBQAChain({
        vectorstore,
        combineDocumentsChain: docChain,
        questionGeneratorChain: questionGenerator,
        returnSourceDocuments: true,
        k: 2, //number of source documents to return
    });
};

As an instance of our LLM, we use OpenAIChat from LangChain. When the temperature is zero, only responses that are directly related to the ingested data or that are based on it should be given. We can receive more imaginative responses from OpenAIChat when the temperature is higher than zero.

For processing unstructured text data, use LoadQAChain. In its most basic form, the implementation injects all of the documents into the prompt. It accepts parameters as well as an LLM. Here, we completely specify the ChatGPT version that will be used and provide our question template in the params object. Under the hood, it builds an LLM chain using our LLM and the prompt template. The LLM chain is then used to generate a Documents Chain, which is then returned.

The last chain that would be called to generate a response is ChatVectorDBQAChain. Imagine it as a complicated chain that is connected to other chains. For instance, it contains the StuffsDocumentChain and QuestionGeneratorChain (which is an LLMChain). It manages the entire procedure, from structuring your query to generating your answer. It functions similarly to ConversationalRetrievalQAChain except that it uses a vector store in place of a base retriever and that it accepts k, the number of source documents it should return along with your response.

📟 SERVER-SENT EVENTS

Server-sent events would be our final important consideration. It is really similar to working with WebSockets. The only significant distinction is that it is one-way; clients can only listen and cannot stream back to the server, whereas the server can emit or stream events to clients.

For the implementation of server-sent events, we would utilize _@microsoft/fetch-event-source. Here is an illustration of how to emit on the server side and how to listen on the client side.

const express = require("express");
const cors = require("cors");

const app = express();
app.use(cors());

const PORT = 5000;

const getTime = () => new Date().toLocaleTimeString();

app.post("/api/chat", function (req, res) {
  res.writeHead(200, {
    Connection: "keep-alive",
    "Content-Type": "text/event-stream",
    "Cache-Control": "no-cache",
  });
  setInterval(() => {
    res.write(
      `data: {"time": "${getTime()}"}`
    );
    res.write("\n\n");
  }, 5000);
});

app.listen(PORT, function () {
  console.log(`Server is running on port ${PORT}`);
});

and for client side:

import { fetchEventSource } from "@microsoft/fetch-event-source";

 await fetchEventSource("http://localhost:5000/api/chat", {
        method: "POST",
        headers: {
          Accept: "text/event-stream",
        },
        onmessage(event) {
          console.log(event.data);
          const parsedData = JSON.parse(event.data);
          setData((data) => [...data, parsedData]);
        },
        onerror(err) {
          console.log("There was an error from server", err);
        },
 });

Now to do the equivalent for Next.js, put the following code in your api.

import type { NextApiRequest, NextApiResponse } from 'next';
import { OpenAIEmbeddings } from 'langchain/embeddings';
import { PineconeStore } from 'langchain/vectorstores';
import { makeChain } from '@/utils/makechain';
import { pinecone } from '@/utils/pinecone-client';
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from '@/config/pinecone';

export default async function handler(
  req: NextApiRequest,
  res: NextApiResponse,
) {
  const { question, history } = req.body;

  if (!question) {
    return res.status(400).json({ message: 'No question in the request' });
  }
  // OpenAI recommends replacing newlines with spaces for best results
  const sanitizedQuestion = question.trim().replaceAll('\n', ' ');

  const index = pinecone.Index(PINECONE_INDEX_NAME);

  /* create vectorstore*/
  const vectorStore = await PineconeStore.fromExistingIndex(
    new OpenAIEmbeddings({}),
    {
      pineconeIndex: index,
      textKey: 'text',
      namespace: PINECONE_NAME_SPACE,
    },
  );

  res.writeHead(200, {
    'Content-Type': 'text/event-stream',
    'Cache-Control': 'no-cache, no-transform',
    Connection: 'keep-alive',
  });

  const sendData = (data: string) => {
    res.write(`data: ${data}\n\n`);
  };

  sendData(JSON.stringify({ data: '' }));

  //create chain
  const chain = makeChain(vectorStore, (token: string) => {
    sendData(JSON.stringify({ data: token }));
  });

  try {
    //Ask a question
    const response = await chain.call({
      question: sanitizedQuestion,
      chat_history: history || [],
    });

    console.log('response', response);
    sendData(JSON.stringify({ sourceDocs: response.sourceDocuments }));
  } catch (error) {
    console.log('error', error);
  } finally {
    sendData('[DONE]');
    res.end();
  }
}

We first obtain our PineconeStore using an already-created index name from the Pinecone dashboard. Any data we send to SendData is simply pushed to the client. To pose the question, we import and call the ChatVectorDBQAChain that we established before. After that, stream source documents to the client with your response.

It will appear as follows on the client side:

const ctrl = new AbortController();

try {
            fetchEventSource("/api/chat", {
                method: "POST",
                headers: {
                    "Content-Type": "application/json",
                },
                body: JSON.stringify({
                    question,
                    history,
                }),
                signal: ctrl.signal,
                onmessage: (event) => {
                    if (event.data === "[DONE]") {

                        // Logic for end of streaming

                    } else {
                        const data = JSON.parse(event.data);
                        if (data.sourceDocs) {

                            // set message state to include documents
                        } else {

                            // set message state to include only data
                        }
                    }
                },
            });
        } catch (error) {
            console.log("error", error);
        }

This is the basic outline of what it will look like. For the complete code, consult the source project.

Let's discuss chat history lastly. Your history is blank when you first ask a question. It adds your most recent question to the chat history after the streaming response to your initial question is complete. When asking your next question, it uses that chat history (your previous question) as context for the new standalone question.

📋 CONCLUSION

The reverse engineering project using Next.js, LangChain, ChatGPT, and Pinecone is now complete. This method may also be used to create a wide range of chatbots that can perform a number of tasks depending on your prompts, the temperature of OpenAI, and the contents of uploaded documents. You are welcome to reach out if you have any questions.

HAPPY CODING!!!🚀🚀

(Article was written with input from ChatGPT)

🔗 HELPFUL LINKS

Original project -
https://github.com/mayooear/gpt4-pdf-chatbot-langchain
My project -
https://github.com/ionknowmyname/pdf-analysis-chat-gpt-langchain-clone
Microsoft Fetch Event Source -
https://blog.logrocket.com/using-fetch-event-source-server-sent-events-react/
LangChain documentation -
https://js.langchain.com/docs/getting-started/guide-llm
Stuff Document Chain - https://js.langchain.com/docs/modules/chains/index_related_chains/document_qa

Top comments (1)

Serge van den Oever • Jun 10 '23

Thanks for the clear overview! Three questions:

The value for k will result in always returning k documents, even if not relevant for the question. How can i filter out documents not relevant enough (distance value too large) so that for a k of 10 for some questions only 3 document chunks are embedded in the prompt?
How can i see the costs (tokens) used for a question? Is returned by OpenAI, but only texts are now extracted.
How can I prevent history to grow large in a long conversation?

Looking forward to your guidance.

DEV Community

Reverse Engineering a Chatbot Project: Exploring the Integration of ChatGPT, LangChain.js, and Pinecone

TABLE OF CONTENT

⚙️ INTRODUCTION

🧰 WHAT YOU NEED

🛠 SETUP

📲 INGEST DATA

⛓ CREATING CHAINS

📟 SERVER-SENT EVENTS

📋 CONCLUSION

🔗 HELPFUL LINKS

Top comments (1)

Read next

How to Deploy Your Next.js App to Cloudflare

Let's Boxing - Train Boxing Anywhere 🥊

How to Build Blazing Fast Websites with Any Framework

Building a Kanban Board with Next.js,Vercel AI and Tolgee