Tiago Souto

Posted on Apr 10

AI Series Part IV: Creating a RAG chatbot with LangChain (NextJS)

#nextjs #langchain #ai #rag

In this post, we'll explore some more coding to build a simple chat app that we can use to ask questions limiting the LLM answers to a specific topic of our desire. But before we start with coding, first, we need to define some concepts that will help us to understand the whole picture.

What is RAG?

Retrieval-Augumented Generation (RAG) is a process of interacting with LLMs using a source of truth to provide the base knowledge the LLM has to provide responses to a user prompt.

This approach was proposed in May 2020, to solve issues that fine-tuned models face such as long-term memory, lack of accuracy for specific outputs, time-consuming, and high costs. But it was largely adopted only in 2023 when it got consolidated as one of the most used techniques to work with LLMs.

How it works?

I won't go deep on this explanation, I'll stay around what's important to know from a practical perspective.
RAG basically needs 4 things to work: a prompt, a store, a retriever, and an LLM.

Prompt: generally, it's textual data provided as input by a user or as instructions by the application
Store: a vector store where the embedded source of truth data (RAG knowledge) is stored
Retriever: it's a method used to find relevant pieces of data in the store
LLM: it's used for interpreting the user prompt, following the application instructions, reading the retrieved context, and generating a response

And, the workflow goes as the following:

A data source is selected to be used (PDFs, DOCX, Web Pages, Video, Audio, Image, etc)
The data source is split into smaller pieces of documents ("chunks")
An LLM is used to embed (convert to vectorial representation) each document chunk
The embedded data is stored in a vector store
A user sends a prompt
The application may provide further instructions about how the LLM should behave as a "system message"
The user prompt is embedded
The embedded user prompt is used to search in the vector store for semantical similar chunks of documents
The most similar documents are retrieved and converted back to textual representation
The user prompt, system prompt, and document chunks are sent to a generative LLM
The LLM generates a response in natural language that'll be sent back to the user

There are many different use cases where RAG can be the best choice for working with LLM like Questions & Answers, Virtual Assistants, Content Generation, and many others.

What's LangChain?

LangChain is a framework originally written in Python - but it also has a JavaScript version - that helps with the development of solutions using AI APIs in really many ways. It has integrations with the most used LLM APIs, vector stores, it can handle document splitting, handles many different retrievers, file loaders, embeddings, prompt templates, and more.

It also has a feature called "chain" that helps to link calls and results from LLM to another. That's very helpful when you have to handle multiple LLM calls in a single request. For example: you can ask an LLM to summarize some content, then get its response and send it to another LLM asking it to generate a nice response for the user, including the summarization data. Or you can split the user request into different LLMs to take advantage of their specializations. Or even to reduce costs.

There are many different use cases for chains. This is really a great feature. And, I'd say LangChain is a must-have for most use cases.

Now that the basic concepts are covered, let's move to the code.

The resource you won't find on Google

I'd like to bring your attention to this because I found it to be very important. When working with open source projects, it's quite common that the documentation sometimes misses some features or details over the API. And that's not different with LangChain. People are doing incredible work with it, but sometimes it's hard to find methods, attribute definitions, and explore the possibilities by just following the official documentation, especially the JS. It does cover lots of things, but not everything. So I highly recommend you keep the API definition (https://api.js.langchain.com/index.html) in hand when you work with it. There are many cases where you'll try to find details on the official docs or by googling it and won't find any relevant resource, but you can find it in the API definition. So keep that in mind. And thanks to LangChain team for providing it for us, this is tremendously helpful. If you ever lose this link, you can get it from references on their GitHub repo: https://github.com/langchain-ai/langchainjs?tab=readme-ov-file#-documentation

Building a RAG Chat

We'll reuse the same base project we did for the OpenAI chat app post. If you missed that, I recommend you take a look at it to understand the project details and follow the OpenAI API setups. You can also download the code from this GitHub repo: https://github.com/soutot/ai-series/tree/main/nextjs-chat-base
First, create a new directory and name it nextjs-langchain-openai. Then initialize the project just like we did before.
Once you get it up and running, we can start by installing the Langchain packages:

pnpm install @langchain/community @langchain/core @langchain/openai langchain

The main package is langchain, but we'll also need @langchain/community to use some packages developed by community, and @langchain/openai to get specific integrations with OpenAI API. The @langchain/core is the base package needed to use all packages other than the main. You can find more details in the official docs: https://js.langchain.com/docs/get_started/installation

Now we need to install our vectorstore dependency. For this sample, we'll use HNSWLib as it's very simple to set up. More details here: https://js.langchain.com/docs/integrations/vectorstores/hnswlib

pnpm install hnswlib-node

Then, update your next.config.mjs as the following:

/** @type {import('next').NextConfig} */
const nextConfig = {
  webpack(config) {
    config.externals = config.externals || [];
    config.externals = [...config.externals, "hnswlib-node"]
    config.resolve.alias['fs'] = false;
return config
  },
};

export default nextConfig;

This is needed to prevent NextJS errors when importing hnswlib and using fs.

With what we need all set, we'll now add a new endpoint to upload our file and create the embeddings.

Inside the api directory, create an embed/route.ts file. And, we'll start by importing the packages needed

import {HNSWLib} from '@langchain/community/vectorstores/hnswlib'
import {OpenAIEmbeddings} from '@langchain/openai'
import {RecursiveCharacterTextSplitter} from 'langchain/text_splitter'
import {NextResponse} from 'next/server'

Then, we'll create our POST method and read the file it'll receive from the form data

export async function POST(request: Request) {
  const data = request.formData()
  const file: File | null = (await data).get('file') as unknown as File
  if (!file) {
    return NextResponse.json({message: 'Missing file input', success: false})
  }

  const fileContent = await file.text()

We'll now initialize the text splitter. We're going to use RecursiveCharacterTextSplitter as this is the recommended way of starting splitting texts. You can find more splitters in the official docs: https://js.langchain.com/docs/modules/data_connection/document_transformers/

const textSplitter = new RecursiveCharacterTextSplitter({
    chunkSize: 1000,
    chunkOverlap: 100,
    separators:['\n']
  })

This is how we split the file content into chunks, by using the createDocuments method

const splitDocs = await textSplitter.createDocuments(fileContent)

Then we initialize the embedding model. In this case, we'll use OpenAIEmbedding which by default uses text-embedding-ada-002. You can use different models, even others than OpenAI's. More details in the official docs: https://js.langchain.com/docs/integrations/text_embedding/openai

const embeddings = new OpenAIEmbeddings({
  openAIApiKey: process.env.OPENAI_API_KEY,
})

And store it in the HSNW vector store

const vectorStore = await HNSWLib.fromDocuments(splitDocs, embeddings)
await vectorStore.save('vectorstore/rag-store.index')
return new NextResponse(JSON.stringify({success: true}), {
  status: 200,
  headers: {'content-type': 'application/json'},
})

Okay, the embedding processing is done. Now we'll be able to generate the vectorial representation of our data source and store it in a vector store for further usage.
We have to update the frontend to send the document to be embedded
Open the page.tsx and add this new function to perform the upload process

async function uploadFile(file: File) {
  try {
    const formData = new FormData()
    formData.append('file', file)
    const response = await fetch('/api/embed', {
      method: 'POST',
      body: formData,
    })
    if (response.ok) {
      console.log('Embedding successful!')
    } else {
      const errorResponse = await response.text()
      throw new Error(`Embedding failed: ${errorResponse}`)
    }
  } catch (error) {
    throw new Error(`Error during embedding: ${error}`)
  }
}

Now, go to the handleSelectedFile method and call this function passing the selectedFile

const handleFileSelected = async (event?: ChangeEvent<HTMLInputElement>) => {
    if (!event) return clearFile()
    setIsUploading(true)
    const {files} = event.currentTarget
    if (!files?.length) {
      return
    }
    const selectedFile = files[0]
    await uploadFile(selectedFile)
    setFile(selectedFile)
    setIsUploading(false)
    event.target.value = '' // clear input as we handle the file selection in state
  }

Cool, we're done here. Now, as the last part, let's update our main API route to retrieve a response based on the user prompt.
First, let's import all dependencies:

import {HNSWLib} from '@langchain/community/vectorstores/hnswlib'
import {BaseMessage} from '@langchain/core/messages'
import {ChatPromptTemplate} from '@langchain/core/prompts'
import {ChatOpenAI, OpenAIEmbeddings} from '@langchain/openai'
import {LangChainStream, StreamingTextResponse} from 'ai'
import {ConversationalRetrievalQAChain} from 'langchain/chains'
import {ChatMessageHistory, ConversationTokenBufferMemory} from 'langchain/memory'
import {NextResponse} from 'next/server'
import {z} from 'zod'

You can find more details of each dependency in the official docs: https://api.js.langchain.com/ and https://js.langchain.com/docs

Now, let's create a prompt template that'll be used to instruct the LLM how it should behave:

const QA_PROMPT_TEMPLATE = `You are a good assistant that answers questions. Your knowledge is strictly limited to the following piece of context. Use it to answer the question at the end.
  If the answer can't be found in the context, just say you don't know. *DO NOT* try to make up an answer.
  If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
  Give a response in the same language as the question.

  Context: """"{context}"""
  Question: """{question}"""
  Helpful answer in markdown:`

Let's create the POST method and read the user prompt

export async function POST(request: Request) {
  const body = await request.json()
  const bodySchema = z.object({
    prompt: z.string(),
  })

const {prompt} = bodySchema.parse(body)

Now let's prepare the retriever to read the data from the vector store

try {
    const embeddings = new OpenAIEmbeddings({
      openAIApiKey: process.env.OPENAI_API_KEY,
    })

    const vectorStore = await HNSWLib.load('vectorstore/rag-store.index', embeddings)
    const retriever = vectorStore.asRetriever()

We now initialize the LLM that'll be used to perform the response. Pay attention to the temperature: 0. For RAG we want the LLM to answer more precisely what's in the document, so lower temperatures are better to prevent different terms or concepts than what's retrieved from the store

const {stream, handlers} = LangChainStream()

const llm = new ChatOpenAI({
  temperature: 0,
  openAIApiKey: process.env.OPENAI_API_KEY,
  streaming: true,
  modelName: 'gpt-3.5-turbo',
  callbacks: [handlers],
})

And, finally, perform the LLM request and send back the streaming response

const chain = ConversationalRetrievalQAChain.fromLLM(llm, retriever, {
      returnSourceDocuments: true,
      qaChainOptions: {
        type: 'stuff',
        prompt: ChatPromptTemplate.fromTemplate(QA_PROMPT_TEMPLATE),
      },
    })
chain.invoke({question: prompt, chat_history: ''})
    return new StreamingTextResponse(stream)

Note that we're passing chat_history: '' as we're still not handling it. So for now the LLM will not have the context of the previously sent messages. This is good enough for our testing purposes. I can walk through how to work with history and memory in a further post.

Now, just run
pnpm run dev
and, if everything's correct, you'll be able to see the app running.

Attach a file so it'll be uploaded to the backend and the vector store will be created from the embedding document. Then ask a question that can be answered by the document content you've just uploaded.

In the example below, I used the Part I of this AI series post:

Well, that's just the beginning. There are so many things that can be done, even by changing this simple example.

Hope someone finds it helpful.

See you in the next part.

GitHub code repository: https://github.com/soutot/ai-series/tree/main/nextjs-chat-rag

DEV Community

AI Series Part IV: Creating a RAG chatbot with LangChain (NextJS)

What is RAG?

What's LangChain?

The resource you won't find on Google

Building a RAG Chat

Top comments (0)

Read next

Deploy NextJS App on Firebase 🔥

Rubber Ducker - GPT + 🦆💻

Unlocking the Power of AI: My Journey Begins

A Novice Guide to Large Language Model (LLM)