DEV Community

Cover image for RAG Application using AWS Bedrock and LangChain

RAG Application using AWS Bedrock and LangChain

Hello, good folks!!
In this part of building the RAG application series, we will leverage Mistral's new model Large using AWS Bedrock and LangChain framework to query over the pdfs.
In the previous article of the series, we learned to build an RAG application using AWS Bedrock and LlamaIndex. To learn more about "what RAG is", please refer to the below article.

Learn to Build a Basic RAG Application | by Somil Gupta | Apr, 2024 | AWS in Plain English

End-to-end Guide Using AWS Bedrock and LlamaIndex to Query Over Your Own PDFs

favicon aws.plainenglish.io

Let's get the learning started.


The implementation of this application involves three components:

1. Create a Vector Store

Created by the Author using Excalidraw.

Load -> Transform -> Embed


We will be using the FAISS vector database, which uses the Facebook AI Similarity Search (FAISS) library. There are many excellent vector store options that you can use, such as ChromaDB or LanceDB.

2. Query Vector Store and 'Retrieve Most Similar.'

Created by the Author using Excalidraw.

The way to handle this is at query time, embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector store takes care of storing embedded data and performing vector search for you.

3. Frame the response using LLM and 'Enhanced Context'

Created by the Author using Excalidraw.

Response Generation Using LLM (Large Language Model): Once the relevant documents are retrieved from Vector Store, a large language model uses the information from these documents to generate a coherent and contextually appropriate response.
These three steps clearly explain the application we are going to build now.


First and foremost, we will set up our AWS SDK for Python using Boto3 and AWS CLI. If you have not installed them before -

(base) ➜  ~ pip3 install boto3
(base) ➜  ~ pip3 install awscli

(base) ➜  ~ aws configure
Enter fullscreen mode Exit fullscreen mode

In this example, we'll use the AWS Titan Embeddings model to generate embeddings. You can use any model that generates embeddings.

import boto3

# Load the Bedrock client using Boto3.
bedrock = boto3.client(service_name='bedrock-runtime')

from langchain_community.embeddings.bedrock import BedrockEmbeddings
titan_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",
                                     client=bedrock)
Enter fullscreen mode Exit fullscreen mode

Now, we will set up the Vector Store to store and retrieve embeddings. We have our PDF stored in the "data" folder of the root directory.

  • In this case we'll split our documents into chunks of 1000 characters with 200 characters of overlap between chunks. The overlap helps mitigate the possibility of separating a statement from an important context related to it. 
  • We will leverage RecursiveCharacterTextSplitter from LangChain, which will recursively split the document using common separators like new lines until each chunk is the appropriate size.
  • We can embed and store all of our document splits in a single command using the FAISS vector store and titan embedding model.
# Vector Store for Vector Embeddings
from langchain_community.vectorstores.faiss import FAISS

# Imports for Data Ingestion
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders.pdf import PyPDFDirectoryLoader

# Load the PDFs from the directory
def data_ingestion():
    loader = PyPDFDirectoryLoader("data")
    documents = loader.load()
    # Split the text into chunks
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,
                                                   chunk_overlap=200)
    docs = text_splitter.split_documents(documents)
    return docs

# Vector Store for Vector Embeddings
def setup_vector_store(documents):
    # Create a vector store using the documents and the embeddings
    vector_store = FAISS.from_documents(
        documents,
        titan_embeddings,
    )
    # Save the vector store locally
    vector_store.save_local("faiss_index")
Enter fullscreen mode Exit fullscreen mode

The next step is to import and load the LLM via Bedrock.

# Import Bedrock for LLM
from langchain_community.llms.bedrock import Bedrock

# Load the LLM from the Bedrock
def load_llm():
    llm = Bedrock(model_id="mistral.mistral-large-2402-v1:0", 
                    client=bedrock, model_kwargs={"max_tokens": 512})
    return llm
Enter fullscreen mode Exit fullscreen mode

We will be using LangChain PromptTemplate to create the prompt template for our LLM. We will produce an answer using a prompt that includes the question and the retrieved data (context).

from langchain.prompts import PromptTemplate

# Create a prompt template
prompt_template = """Use the following pieces of context to answer the 
question at the end. Please follow the following rules:
1. If the answer is not within the context knowledge, kindly state 
that you do not know, rather than attempting to fabricate a response.
2. If you find the answer, please craft a detailed and concise response 
to the question at the end. Aim for a summary of max 250 words, ensuring
 that your explanation is thorough.

{context}

Question: {question}
Helpful Answer:"""

PROMPT = PromptTemplate(template=prompt_template, 
                            input_variables=["context", "question"])
Enter fullscreen mode Exit fullscreen mode

Now, let's write the actual application logic. We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.
We need to define the LangChain Retriever interface. Load RetrievalQA from LangChain as it provides a simple interface for interacting with the LLM.

from langchain.chains.retrieval_qa.base import RetrievalQA

# Create a RetrievalQA chain and invoke the LLM
def get_response(llm, vector_store, query):
    retrieval_qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vector_store.as_retriever(
            search_type="similarity", search_kwargs={"k": 3}
        ),
        chain_type_kwargs={"prompt": PROMPT},
        return_source_documents=True,
    )
    return retrieval_qa
Enter fullscreen mode Exit fullscreen mode

Let's put it all together into a chain. This tutorial will use Streamlit to create a UI that interacts with our RAG.

  • We will provide a simple button in the sidebar to create and update a vector store and store it in the local storage.
  • Whenever a user enters a query, we will first get the faiss_index from our local storage and then query our LLM using the retrieved context.
def streamlit_ui():
    st.set_page_config("My Gita RAG")
    st.header("RAG implementation using AWS Bedrock and Langchain")

    user_question = st.text_input("Ask me anything from My Gita e.g. 
                                          What is the meaning of life?")

    with st.sidebar:
        st.title("Update Or Create Vector Embeddings")

        if st.button("Update Vector Store"):
            with st.spinner("Processing..."):
                docs = data_ingestion()
                setup_vector_store(docs)
                st.success("Done")

    if st.button("Generate Response") or user_question:
        # first check if the vector store exists
        if not os.path.exists("faiss_index"):
            st.error("Please create the vector store 
                                first from the sidebar.")
            return
        if not user_question:
            st.error("Please enter a question.")
            return
        with st.spinner("Processing..."):
            faiss_index = FAISS.load_local("faiss_index", 
                                          embeddings=titan_embeddings,
                                      allow_dangerous_deserialization=True)
            llm = load_llm()
            st.write(get_response(llm, faiss_index, user_question))
            st.success("Done")
Enter fullscreen mode Exit fullscreen mode

This is how our Streamlit application will look.

Screenshot by the Author.

The complete code for the application is available here on my github: somilg050.

Screenshot by the Author.

You can play around with the code by customizing the prompt and changing the parameters to the LLM.

In conclusion, we have created an application that takes a question, retrieves relevant documents, constructs a prompt, passes that to a model, and parses the output.


Thanks for reading the tutorial. I hope you learn something new today. If you want to read more stories like this, I invite you to follow me.

Till then, Sayonara! I wish you the best in your learning journey.

Top comments (0)