Pavan Belagatti

Posted on May 8, 2024 • Originally published at Medium

Creating AI Apps Using RAG & LangChain: A Step-by-Step Developer Guide!

#machinelearning #ai #tutorial #datascience

Today, Large language models (LLMs) have emerged as one of the biggest building blocks of modern AI/ML applications. Gone are the days when AI was considered more of a fiction rather than a reality. Every organization is embracing the power of these LLMs to build their personalized applications. The advantages these LLMs provide are enormous and hence it is obvious that the demand for such applications is more.

Companies such as Google, Meta, OpenAI, Anthropic, etc to name a few, have tremendously contributed to the growth of Generative AI. But to build LLM-powered applications, LLMs are just not enough, you need to have some tools, a framework and an approach to make sure the applications are robust and work as expected.

In this article, we are going to discuss one such framework known as retrieval augmented generation (RAG) along with some tools and a framework called LangChain.

What is Retrieval Augmented Generation?

Large language models are great but they too have limitations such as creating fake, biased, made-up responses that are inaccurate and these are referred to as LLM hallucinations. Such responses generated by these LLMs hurt the applications authenticity and reputation. To mitigate such unwanted responses from the LLMs, there are some techniques that have gained popularity. One such approach is retrieval augmented generation (RAG).

RAG is where the LLM applications are augmented with some external knowledge base to mitigate the effects of hallucination. This way, for any user query, the system goes through the knowledge base to search for the relevant information and finds the most accurate information. There will be no room for hallucination since the custom knowledge source is already present.

See the above image for example, the PDF is our external knowledge base that is stored in a vector database in the form of vector embeddings (vector data). Basically, the PDF document gets split into small chunks of words and these words are then assigned with numerical numbers known as vector embeddings. You need an embedding model to convert text, image, audio, video, into embeddings.

The user query goes through the same LLM to convert it into an embedding and then through the vector database to find the most relevant document. Once the relevant document is found, it is then added with more context through the LLM and finally the response is generated. This way, RAG has become the bread and butter of most of the LLM-powered applications to retrieve the most accurate if not relevant responses. Well, there are some notable AI frameworks such as LangChain and LlamaIndex that help these LLM applications to be robust by providing all the toolkit required. Let’s understand LangChain since we will be using LangChain in our tutorial.

What is LangChain?

LangChain is an open-source AI framework developed by Harrison Chase to help developers to create robust AI applications by provisioning all the components required. LangChain is equipped with memory capabilities, integrations with vector databases, tools to connect with external data sources, logic and APIs. This makes LangChain a powerful framework for building LLM-powered applications.

Image credits: Upstash

LangChain consists of modules such as Model I/O, Retrieval, Chains and Agents, each having their own strengths to help developers build seamless AI applications. Model I/O module handles prompts, LLMs interaction, chat models and output parsers. The retrieval module handles everything related to data management from loading to modifying to text splitters to embedding the data using embedding models. Then comes the Chain module and as the name suggests, it basically interlinks all the tasks together to make sure the tasks happen in a sequential fashion.

The agents act as the brain of the system that handles the decision making. They determine the sequence of actions to take to complete the task. The agent is capable of choosing the tools required for the task. LangChain has many agent toolkit libraries that can be used to build powerful LLM powered applications.

You can install LangChain using the following pip command

pip install langchain

What is SingleStore?

SingleStore is a modern cloud-based relational and distributed database management system that specializes in high-performance, real-time data processing. SingleStore is not just for OLAP and OLTP workloads, but one can also build real-time GenAI applications seamlessly.

SingleStore started to support vector search and storage back in 2017 itself. It has some amazing integrations with today’s popular AI frameworks such as Langchain, LlamaIndex etc. Supports both SQL and Python and all the data types and this makes it the only database any organization can have instead of having different types of databases for different types of workloads.

RAG with LangChain and SingleStore: Hands-on Tutorial!

Let’s build a simple AI application that can fetch the contextually relevant information from our own custom data for any given user query.

Once you sign up, you need to create a workspace. It is easy and free, so do it.

Once you create your workspace, create a database with any name of your wish.

As you can see from the above screenshot, you can create the database from ‘Create Database’ tab on the right side.

Now, let’s go to ‘Develop’ to use our Notebooks feature [just like Jupyter Notebooks]

Create a new Notebook and name it as you wish.

Before doing anything, select your workspace and database from the dropdown on the Notebook.

Now, start adding all the below shown code snippets into your Notebook you just created as shown below.

Install the required libraries & dependencies

!pip install langchain --quiet
!pip install --upgrade openai==0.28.1 --quiet
!pip install pdf2image --quiet
!pip install pdfminer.six --quiet
!pip install singlestoredb --quiet
!pip install tiktoken --quiet
!pip install --upgrade unstructured==0.10.14 --quiet

Import the libraries

from langchain.document_loaders import PyPDFLoader
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
import os

Load your custom document

from langchain.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("example.pdf")
data = loader.load()

I am using this publicly available pdf about world tourism barometer.

[If you like to use the same, mention it in the place of example.pdf with the complete url]

Using LangChain framework to split the document into chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

print(f"You have {len(data)} document(s) in your data")
print(f"There are {len(data[0].page_content)} characters in your document")

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 2000, chunk_overlap = 0)
texts = text_splitter.split_documents(data)

print(f"You have {len(texts)} pages")

Use OpenAI API to generate embeddings for the document chunks

import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key: ")

When you run the above command, it prompts you to add your OpenAI api key.

Let’s store our document chunks into SingleStore database table

Action required: Make sure you have selected the workspace and the database where you want to store your data.

from langchain.embeddings import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

#from langchain.vectorstores.singlestoredb as s2
from langchain.vectorstores import SingleStoreDB
#from langchain.vectorstores.utils import DistanceStrategy

#s2.ORDERING_DIRECTIVE["DOT_PRODUCT"] = s2.ORDERING_DIRECTIVE[DistanceStrategy.DOT_PRODUCT]

docsearch = SingleStoreDB.from_documents(
    texts,
    embedding,
    table_name = "tourism_pdf",
    #distance_strategy = "DOT_PRODUCT"
)

You can change the table name as per your wish.

Let us check the text chunks and associated embeddings stored inside our database.

select * from tour_pdf limit 1;

Ask a query against your custom data (the pdf that you loaded) using just similarity search to retrieve the top k closest content.

query = "Global inflation is expected to fall or rise in 2023?"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)

The answer you should see is a big paragraph which is less accurate and not so efficient.

Here is the augmented response to the user query

import openai

prompt = f"The user asked: {query}. The most similar text from the document is: {docs[0].page_content}"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": prompt}
    ]
)
print(response['choices'][0]['message']['content'])

The response you receive will be to the point and highly efficient with more context.

Let’s test when knowledge base (custom documents like pdf) is not provided

from langchain.llms import OpenAI
llm = OpenAI(temperature=0.8)

llm.predict("your query?")

In this case, it won’t provide the information and might say that it doesn’t have enough data to answer your question/query. This way, the RAG approach mitigates the hallucination effects of LLMs and increases the efficiency.

Finally, you can go to your database and verify if the provided pdf is stored chunkwise. You should see the data as below.

Hope you understood how we utilized the RAG approach combined with LangChain framework and SingleStore to store and retrieve data efficiently. If you like to try the above tutorial, you need a free SingleStore account, OpenAI api key and a publicly available pdf.

Try the tutorial and let me know what you think:)