Vector Databases like pinecode is a good candidate to store your custom data, that you want to use in your next AI application.
In the blog post i will be using pinecone vector database which is easy to use and its a cloud native.
Also i will be using openAI apis as my LLM model.
First get your pinecone API Key - https://app.pinecone.io/organizations/-/projects
You will need openAI API key as well to call openAI models
Install below python libs
!pip install langchain
!pip install pinecone-client
!pip install openai
!pip install pypdf
!pip install tiktoken
!pip install langchain-community
%pip install --upgrade --quiet langchain-pinecone langchain-openai langchain
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.vectorstores import Pinecone
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
import os
We can upload a folder in current workspace and we can upload a any pdf file with your data into that folder. This will be your custom data to train your chat agent
!mkdir pdfs
loader = PyPDFDirectoryLoader("pdfs")
data = loader.load()
Split the loaded data into smaller chunks before inserting into the vector database
text_chunks = text_splitter.split_documents(data)
Set your keys
os.environ["OPENAI_API_KEY"] = "<Key>"
os.environ["PINECONE_API_KEY"] = "<Key>"
Use openAIEmbedding to embedded your texts
embeddings = OpenAIEmbeddings()
Use langchain pineconevectorstore module to store data into the pinecone.
Before that make sure that you have created a new index in pinecone database with a namespace, in my case "roshan"
from langchain_pinecone import PineconeVectorStore
index = "vectorone"
docsearch = PineconeVectorStore.from_documents(text_chunks, embeddings, index_name=index,namespace='roshan')
Now you check your pinecone database you should be able to see data
You can run your query to ask questions around your uploaded pdf data
docsearch.as_retriever()
query = "what is Scaled Dot-Product Attention?"
docs = docsearch.similarity_search(query)
llm = OpenAI(temperature=0)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever())
qa.run(query)
You can simply build small commandline chatbot to see how its working.
import sys
while True:
user_input = input(f"Input Prompt:" )
if user_input == "exit":
sys.exit()
if user_input == '':
continue
result = qa.run({'query': user_input})
print(result)
Top comments (1)
Great 👍