DEV Community

Cover image for Chat With Your PDFs: Part 1 - An End to End LangChain Tutorial For Building A Custom RAG with OpenAI.
Austin Vance for Focused

Posted on • Updated on

Chat With Your PDFs: Part 1 - An End to End LangChain Tutorial For Building A Custom RAG with OpenAI.

A common use case for developing AI chat bots is ingesting PDF documents and allowing users to ask questions, inspect the documents, and learn from them. In this tutorial we will start with a 100% blank project and build an end to end chat application that allows users to chat about the Epic Games vs Apple Lawsuit.

There's a lot of content packed into this one video so please ask questions in the comments and I will do my best to help you get past any hurdles.

In Part One You will Learn:

  • Create a new app using @LangChain 's LangServe
  • ingestion of PDFs using @unstructuredio
  • Chunking of documents via @LangChain 's SemanticChunker
  • Embedding chunks using @OpenAI 's embeddings API
  • Storing embedded chunks into a PGVector a vector database
  • Build a LCEL Chain for LangServe that uses PGVector as a retriever
  • Use the LangServe playground as a way to test our RAG
  • Stream output including document sources to a future front end.

In Part 2 we will focus on:

  • Creating a front end with Typescript, React, and Tailwind
  • Display sources of information along with the LLM output
  • Stream to the frontend with Server Sent Events
  • Deploying the Backend application to @DigitalOcean & @LangChain 's LangServe hosted platform to compare
  • Deploying the frontend to @DigitalOcean 's App Platform

In Part 3 we will focus on:

  • Adding Memory to the @LangChain Chain with PostgreSQL
  • Add Multiquery to the chain for better breadth of search
  • Add sessions to the Chat History

Github repo

https://github.com/focused-labs/pdf_rag

Top comments (5)

Collapse
 
chekc profile image
Sohan Venkatesh

can multiple pdfs be stored here

Collapse
 
evildrham profile image
EvilDrHam

part1: i have an error TypeError: expected string or bytes-like object, got 'list' on line: text_splitter = SemanticChunker(embeddings=embeddings)

Collapse
 
evildrham profile image
EvilDrHam

correction: i got error on: chunks =text_splitter.create_documents(docs)

Collapse
 
evildrham profile image
EvilDrHam

solved! if i use parameter use_multithreading=True it returns a list. :-)

Collapse
 
ricgene profile image
ricgene

I've watched Part 1 Austin - awesome - thank you!