DEV Community

Sachin Uplaonkar
Sachin Uplaonkar

Posted on

Multi-Modal Agentic RAG using LangChain

Explore the world of agents…

Generated by Dall-E-3

Most of you might have explored the world of RAG (Retrieval Augmented Generation) to chat with your documents and explored the world of generating images as well. With the capability of function calling now baked into most of the GenAI model providers, the applications are limitless.

Agents

Agents are kind of mini tasks that help you to achieve your specific goals. When multiple agents are combined, they form an indispensable force. Like one program to help you chat with LLMs, make a call to an external API, translate, image generation, etc. Today let’s explore the agents that help us:

  1. Chat with any PDF
  2. Get recent papers from paperswithcode
  3. Use Dall-E-3 to generate artistic abstract images

Code

The code is available on GitHub with well-structured getting started steps.

I will be giving a brief overview of it here:

The technology stack that we will be using is: LangChain, OpenAI, paperswithcode API, ChromaDB, and Dall-E-3.

Upload your PDF document in the docs folder and explore the default agents. If interested, go ahead with creating your agents by adding them to the services folder and creating a corresponding tool in the tools folder.

The option to execute the code in Gradio or Streamlit is provided with the default code running as Streamlit. I found it difficult to showcase the image/text in Gradio, hence the switch to Streamlit.

Restrictions

The code is restricted to answer only from the document, if you ask outside of it, the app won’t make a query to the outside world to fetch the response (which in most cases we want if dealing with sensitive documents).

Results

Following are some of the responses from the app, note: the image prompt was taken from Google

Capabilities

Chat with documents

Get the latest paper on <your topic> from Paperwithcode

Image Generation

Applied Restrictions

Feel free to play around and use it for your own with extended agents!

Add-On

It is always good to know the full trace of your query, I would suggest you try LangSmith from LangChain. It provides a whole lot of information on how your query is passed over different stages, and also the cost associated with each of them.

Top comments (0)