DEV Community

Cover image for How a history-aware retriever works?
Guilherme Pereira Alves
Guilherme Pereira Alves

Posted on

How a history-aware retriever works?

The history-aware retriever discussed in this post is the one returned by the create_history_aware_retriever function from the LangChain package. This function is designed to receive the following inputs in its constructor:

  • An LLM (a language model that receives a query and returns an answer);
  • A vector store retriever (a model that receives a query and returns a list of relevant documents).
  • A chat history (a list of message interactions, typically between a human and an AI).

When invoked, the history-aware retriever takes a user query as input and outputs a list of relevant documents. The relevant documents are based on the query combined with the context provided by the chat history.

At the end, I summarize its workflow.

Setting it

from langchain.chains import create_history_aware_retriever
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_chroma import Chroma
from dotenv import load_dotenv
import bs4

load_dotenv() # To get OPENAI_API_KEY
Enter fullscreen mode Exit fullscreen mode
def create_vectorsore_retriever():
    """
    Returns a vector store retriever based on the text of a specific web page.
    """
    URL = r'https://lilianweng.github.io/posts/2023-06-23-agent/'
    loader = WebBaseLoader(
        web_paths=(URL,),
        bs_kwargs=dict(
            parse_only=bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))
        ))
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True)
    splits = text_splitter.split_documents(docs)
    vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
    return vectorstore.as_retriever()
Enter fullscreen mode Exit fullscreen mode
def create_prompt():
    """
    Returns a prompt instructed to produce a rephrased question based on the user's
    last question, but referencing previous messages (chat history).
    """
    system_instruction = """Given a chat history and the latest user question \
        which might reference context in the chat history, formulate a standalone question \
        which can be understood without the chat history. Do NOT answer the question, \
        just reformulate it if needed and otherwise return it as is."""

    prompt = ChatPromptTemplate.from_messages([
        ("system", system_instruction),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")])
    return prompt
Enter fullscreen mode Exit fullscreen mode
llm = ChatOpenAI(model='gpt-4o-mini')
vectorstore_retriever = create_vectorsore_retriever()
prompt = create_prompt()
Enter fullscreen mode Exit fullscreen mode
history_aware_retriever = create_history_aware_retriever(
    llm,
    vectorstore_retriever,
    prompt
)
Enter fullscreen mode Exit fullscreen mode

Using it

Here, a question is being asked without any chat history, so the retriever only responds with the documents relevant to the last question.

chat_history = []

docs = history_aware_retriever.invoke({'input': 'what is planning?', 'chat_history': chat_history})
for i, doc in enumerate(docs):
    print(f'Chunk {i+1}:')
    print(doc.page_content)
    print()
Enter fullscreen mode Exit fullscreen mode
Chunk 1:
Planning is essentially in order to optimize believability at the moment vs in time.
Prompt template: {Intro of an agent X}. Here is X's plan today in broad strokes: 1)
Relationships between agents and observations of one agent by another are all taken into consideration for planning and reacting.
Environment information is present in a tree structure.

Chunk 2:
language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.

Chunk 3:
Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural

Chunk 4:
Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.


Memory
Enter fullscreen mode Exit fullscreen mode

Now, based on the chat history, the retriever knows that the human want to know about task decomposition as well as planning. So it responds with chunks of text that reference both themes.

chat_history = [
    ('human', 'when I ask about planning I want to know about Task Decomposition too.')]

docs = history_aware_retriever.invoke({'input': 'what is planning?', 'chat_history': chat_history})
for i, doc in enumerate(docs):
    print(f'Chunk {i+1}:')
    print(doc.page_content)
    print()
Enter fullscreen mode Exit fullscreen mode
Chunk 1:
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Chunk 2:
Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#

Chunk 3:
Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.


Memory

Chunk 4:
Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.
Enter fullscreen mode Exit fullscreen mode

Now the entirely question is based on the chat history. And we can see that it responds with chunks of text that reference the correct concept.

chat_history = [
    ('human', 'What is ReAct?'),
    ('ai', 'ReAct integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space')]

docs = history_aware_retriever.invoke({'input': 'It is a way of doing what?', 'chat_history': chat_history})
for i, doc in enumerate(docs):
    print(f'Chunk {i+1}:')
    print(doc.page_content)
    print()
Enter fullscreen mode Exit fullscreen mode
Chunk 1:
ReAct (Yao et al. 2023) integrates reasoning and acting within LLM by extending the action space to be a combination of task-specific discrete actions and the language space. The former enables LLM to interact with the environment (e.g. use Wikipedia search API), while the latter prompting LLM to generate reasoning traces in natural language.
The ReAct prompt template incorporates explicit steps for LLM to think, roughly formatted as:
Thought: ...
Action: ...
Observation: ...

Chunk 2:
Fig. 2. Examples of reasoning trajectories for knowledge-intensive tasks (e.g. HotpotQA, FEVER) and decision-making tasks (e.g. AlfWorld Env, WebShop). (Image source: Yao et al. 2023).
In both experiments on knowledge-intensive tasks and decision-making tasks, ReAct works better than the Act-only baseline where Thought: … step is removed.

Chunk 3:
The LLM is provided with a list of tool names, descriptions of their utility, and details about the expected input/output.
It is then instructed to answer a user-given prompt using the tools provided when necessary. The instruction suggests the model to follow the ReAct format - Thought, Action, Action Input, Observation.

Chunk 4:
Case Studies#
Scientific Discovery Agent#
ChemCrow (Bran et al. 2023) is a domain-specific example in which LLM is augmented with 13 expert-designed tools to accomplish tasks across organic synthesis, drug discovery, and materials design. The workflow, implemented in LangChain, reflects what was previously described in the ReAct and MRKLs and combines CoT reasoning with tools relevant to the tasks:

Enter fullscreen mode Exit fullscreen mode




Conclusion

In conclusion, the workflow of the history-aware retrievers functions as follows when .invoke({'input': '...', 'chat_history': '...'}) is called:

  • It replaces the input and chat_history placeholders in the prompt with specified values, creating a new ready-to-use prompt that essentially says "Take this chat history and this last input, and rephrase the last input in a way that anyone can understand it without seeing the chat history".
  • It sends the new prompt to the LLM and receives a rephrased input.
  • It then sends the rephrased input to the vector store retriever and receives a list of documents relevant to this rephrased input.
  • Finnally, it returns this list of relevant documents.

Obs.: It is important to note that the embedding used to transform text into vector is the one specified whem Chroma.from_documents is called. When none is specified (the present case), the default chroma embedding is used.

Top comments (0)