In the 4th part of this blog series, we'll have a look at the most popular adapters available at llamahub.ai. LLaMahub is a library of data connectors that translate raw data from various sources into vector(s) aiding in ease of access and transformation by LLMs like ChatGPT and Google Bard.
This is an open-source project where the adapters/connectors that are created would slowly get integrated into projects like Langchain and Llama-index. If you are building a custom document bot for your firm or even for your personal use I think you would find this super useful.
Now, lets get started !
For the sake of simplicity in this walkthrough, We will access the adapters of llamahub through Llama-Index
Adapters Assemble ๐จโก
First up here's how you could load any loader with Llama-Index:
from llama_index import download_loader
Loader = download_loader("<NAME_OF_THE_LOADER>")
the download_loader
helper method will make sure to load the mentioned loader along with all the needed dependencies. To check the name of the loader that you'd want to use, visit this documentation. With that out of the way, lets have a look at some of the important data loaders in Llama-Index.
SimpleDirectoryReader
:
This is a universal data connector available in the llama-index that should be able to handle pretty much all the file types you throw at it. Under the hood it uses various third party libraries like pypdf2
for PDFs, python-pptx
for PPTs etc... The general idea behind this data connector is to read a directory of different file types in entirety or to read single files, doesn't matter what kind of file it is.
SimpleDirectoryReader = download_loader("SimpleDirectoryReader")
documents = SimpleDirectoryReader(input_dir='foo/').load_data()
SimpleWebPageReader
:
This is the web equivalent of the SimpleDirectoryReader
. This can read any webpage by using BeautifulSoup4
.
SimpleWebPageReader = download_loader("SimpleWebPageReader")
documents = SimpleWebPageReader().load_data(urls=['https://foo.com/index'])
YoutubeTranscriptReader
:
This loader will attempt to transcript any youtube video that you provide. It will first try to search for any available captions using the youtube-transcript-api
and if the captions are not available, it will use OpenAI's whisper API to transcript the video.
YoutubeTranscriptReader = download_loader("YoutubeTranscriptReader")
documents = YoutubeTranscriptReader().load_data(ytlinks=['https://youtu.be/5MuIMqhT8DM'])
The above are some of the loaders that I think have a lot of use cases. But there are more and you can find the complete list here. If you want to load the documents into langchain's Document
format, you can use <Loader>.load_langchain_documents()
instead of load_data
, it will output the documents in the langchain supported format.
Once you load the documents, it is a cake walk from there. You can visit my old blogs in this series on how you could use this as context for the ChatGPT API.
Now, that's about LlamaHub, See ya in the next one ๐
Top comments (0)