Multi-Modal Agentic RAG using LangChain

#langchain #agents #openai #rag

Explore the world of agents…

Most of you might have explored the world of RAG (Retrieval Augmented Generation) to chat with your documents and explored the world of generating images as well. With the capability of function calling now baked into most of the GenAI model providers, the applications are limitless.

Agents

Agents are kind of mini tasks that help you to achieve your specific goals. When multiple agents are combined, they form an indispensable force. Like one program to help you chat with LLMs, make a call to an external API, translate, image generation, etc. Today let’s explore the agents that help us:

Chat with any PDF
Get recent papers from paperswithcode
Use Dall-E-3 to generate artistic abstract images

Code

The code is available on GitHub with well-structured getting started steps.

I will be giving a brief overview of it here:

The technology stack that we will be using is: LangChain, OpenAI, paperswithcode API, ChromaDB, and Dall-E-3.

Upload your PDF document in the docs folder and explore the default agents. If interested, go ahead with creating your agents by adding them to the services folder and creating a corresponding tool in the tools folder.

The option to execute the code in Gradio or Streamlit is provided with the default code running as Streamlit. I found it difficult to showcase the image/text in Gradio, hence the switch to Streamlit.

Restrictions

The code is restricted to answer only from the document, if you ask outside of it, the app won’t make a query to the outside world to fetch the response (which in most cases we want if dealing with sensitive documents).

Results

Following are some of the responses from the app, note: the image prompt was taken from Google

Feel free to play around and use it for your own with extended agents!

Add-On

It is always good to know the full trace of your query, I would suggest you try LangSmith from LangChain. It provides a whole lot of information on how your query is passed over different stages, and also the cost associated with each of them.

DEV Community

Multi-Modal Agentic RAG using LangChain

Agents

Code

Restrictions

Results

Add-On

Top comments (0)

Read next

The Rise and Fall of RAG-based Solutions

Multi-Agent System for 🚀 ANY AI/ML Model: 🌐 Web Scraping & 📝 Content Analysis Powered by the 🔗 AI/ML API

Convert Image to Code : Speed Up Your UI Development with The Snap Code

CommunityKG-RAG: Leveraging Community Structures in Knowledge Graph for Advanced RAG in Fact-Checking