DEV Community

Cover image for RAG Simplified!! 🐣
Rohan Sharma
Rohan Sharma

Posted on

RAG Simplified!! 🐣

Hii Hiiiii! πŸ‘‹

Are you stuck between AI and AI?? I'm too! But we have to go with the flow else we won't be able to last our impact!

This blog is about one such AI thing that is creating a promising impact in the tech world. It doesn't matter if you are a beginner or an expert, if you're working in the tech field or have an interest in it, then you must know about this.

In this blog, I'll be covering Retrieval-Augmented Generation (RAG) in detail and creating a quick prompt model using an exceptional framework LLMWARE.

Let's start... 3️⃣... 2️⃣... 1️⃣... πŸ€“

Β 

What is RAG??

Let's start with the basics so that you can easily understand RAG.

So, first of all, What is AI? AI or Artificial Intelligence is nothing but just the science and engineering of making intelligent machines.

Inside AI, there are so many subsets. Take a look at the diagram below:

ai types

Subsets of AI

Β 

Now, let's discuss another field of Chaos, Machine Learning(ML). As per the above diagram, it might be clear that ML is a subset of AI. ML is focused on building computer systems that learn from data. Therefore, ML is a part of the AI that processes and trains a piece of software, called a model, to make useful predictions or generate content from data.

Fun Fact: LLM is a type of artificial intelligence (AI) program and is built on machine learning. Thus, LLMs are trained on huge sets of data β€” hence the name "large."



AI
β”œβ”€β”€ ML (Machine Learning)
β”‚   β”œβ”€β”€ LLM (Large Language Models)
β”‚   └── RAG (Retrieval-Augmented Generation)


Enter fullscreen mode Exit fullscreen mode

Β 

But What is RAG⁉️

RAG or Retrieval-Augmented Generation is a groundbreaking AI framework (as same as NextJs is a framework of Js) for improving the quality of LLM-generated responses by grounding the model on external sources of knowledge.

rag

Working of RAG

Β 

RAG extends the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It is a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.

I hope you are somewhat clear with the RAG concept. To make the concept clearer, let's jump to the Example part, where we will be creating a simple project to test Prompt-based RAG Models using LLMWARE as the framework.

If you don't know about LLMWARE, please read the below article. It's a only pre-requisite for building the project! 😝

Β 

Let's Prompt Model with LLMWare.ai πŸ€–

llmware provides a unified framework for building LLM-based applications (e.g., RAG, Agents), using small, specialized models that can be deployed privately, integrated with enterprise knowledge sources safely and securely, and cost-effectively tuned and adapted for any business process.

In this example, we will illustrate:

  1. Discovery - how to discover models in the llmware ModelCatalog.
  2. Load Model - how to load a selected model from the catalog.
  3. Prompt - how to create a basic prompt and run an inference with the model.

Β 

So let's start 🟩:

1️⃣ Install the llmware as explained above. Or simply run this code in the terminal:



 pip3 install llmware


Enter fullscreen mode Exit fullscreen mode

2️⃣ Considering that you don't have any test questions to test this project. Therefore, you can use the below one:



def hello_world_questions():

    """ This is a set of useful test questions to do a 'hello world' but there is nothing special about the
    questions - please feel free to edit and ask your own queries with your own context passages.

    --if you are using one of the llmware models, please take note that the models have been trained to answer
    based on the information provided, so if you ask a question without passing any context passage, then
    don't be surprised if the model responds with 'Not Found.' """

    test_list = [

    {"query": "What is the total amount of the invoice?",
     "answer": "$22,500.00",
     "context": "Services Vendor Inc. \n100 Elm Street Pleasantville, NY \nTO Alpha Inc. 5900 1st Street "
                "Los Angeles, CA \nDescription Front End Engineering Service $5000.00 \n Back End Engineering"
                " Service $7500.00 \n Quality Assurance Manager $10,000.00 \n Total Amount $22,500.00 \n"
                "Make all checks payable to Services Vendor Inc. Payment is due within 30 days."
                "If you have any questions concerning this invoice, contact Bia Hermes. "
                "THANK YOU FOR YOUR BUSINESS!  INVOICE INVOICE # 0001 DATE 01/01/2022 FOR Alpha Project P.O. # 1000"},

    {"query": "What was the amount of the trade surplus?",
     "answer": "62.4 billion yen ($416.6 million)",
     "context": "Japan’s September trade balance swings into surplus, surprising expectations"
                "Japan recorded a trade surplus of 62.4 billion yen ($416.6 million) for September, "
                "beating expectations from economists polled by Reuters for a trade deficit of 42.5 "
                "billion yen. Data from Japan’s customs agency revealed that exports in September "
                "increased 4.3% year on year, while imports slid 16.3% compared to the same period "
                "last year. According to FactSet, exports to Asia fell for the ninth straight month, "
                "which reflected ongoing China weakness. Exports were supported by shipments to "
                "Western markets, FactSet added. β€” Lim Hui Jie"}
]

    return test_list


Enter fullscreen mode Exit fullscreen mode

3️⃣ Make a Python file, let's say fast_start_rag.py, and paste the below code:



import time
from llmware.prompts import Prompt
from llmware.models import ModelCatalog

def fast_start_prompting  (model_name):

    """ This is the main example script - it loads the question list, loads the model, and executes the prompts. """

    t0 = time.time()

    # load in the 'hello world' test questions above
    test_list = hello_world_questions()

    print(f"\n > Loading Model: {model_name}...")

    prompter = Prompt().load_model(model_name)

    t1 = time.time()
    print(f"\n > Model {model_name} load time: {t1-t0} seconds")

    for i, entries in enumerate(test_list):
        print(f"\n{i+1}. Query: {entries['query']}")

        #   run the prompt
        output = prompter.prompt_main(entries["query"],
                                      context=entries["context"],
                                      prompt_name="default_with_context",
                                      temperature=0.30)

        #   'output' is a dictionary with two keys - 'llm_response' and 'usage'
        #   --'llm_response' is the output from the model
        #   --'usage' is a dictionary with the usage stats

        llm_response = output["llm_response"].strip("\n")
        print(f"LLM Response: {llm_response}")

        #   note: the 'gold answer' is the answer we provided above in the hello_world question list
        print(f"Gold Answer: {entries['answer']}")

        print(f"LLM Usage: {output['usage']}")

    t2 = time.time()
    print(f"\nTotal processing time: {t2-t1} seconds")

    return 0


if __name__ == "__main__":

    #   Step 1 - we will pick a model from the ModelCatalog

    #   A few useful methods to discover and display a list of available models...

    #   all generative models
    llm_models = ModelCatalog().list_generative_models()

    #   if you only want to see the local models
    llm_local_models = ModelCatalog().list_generative_local_models()

    #   to see only the open source models
    llm_open_source_models = ModelCatalog().list_open_source_models()

    #   we will print out the local models
    for i, models in enumerate(llm_local_models):
        print("models: ", i, models["model_name"], models["model_family"])

    #   for purposes of demo, try a few selected models from the list

    #   each of these pytorch models are ~1b parameters and will run reasonably fast and accurate on CPU
    #   --per note above, may require separate pip3 install of: torch and transformers
    pytorch_generative_models = ["llmware/bling-1b-0.1", "llmware/bling-tiny-llama-v0", "llmware/bling-falcon-1b-0.1"]

    #   bling-answer-tool is 1b parameters quantized
    #   bling-phi-3-gguf is 3.8b parameters quantized
    #   dragon-yi-6b-gguf is 6b parameters quantized
    gguf_generative_models = ["bling-answer-tool", "bling-phi-3-gguf","llmware/dragon-yi-6b-gguf"]

    #   by default, we will select a gguf model requiring no additional imports
    model_name = gguf_generative_models[0]

    #   to swap in a GPT-4 openai model - uncomment these two lines
    #   model_name = "gpt-4"
    #   os.environ["USER_MANAGED_OPENAI_API_KEY"] = "<insert-your-openai-key>"

    fast_start_prompting(model_name)


Enter fullscreen mode Exit fullscreen mode

4️⃣ Move to the terminal again and run the below code to run the application:



python fast_start_rag.py


Enter fullscreen mode Exit fullscreen mode

Output πŸ“ƒ

output

Output of the Example Test Case. We have added a Gold Answer, that's just for checking/tallying the response answers.

Β 

Although the code is self-explanable (check the comments) but you might be wondering, what's just happened right now! You may have many questions. But wait! I have that explanation part, especially for visual learners. Kindly go through this link once, Prompt Models (Ex. 3): Fast Start to RAG (2024). And if you want to learn more, then go through the playlist:

Fast Start to RAG (2024 updates) - YouTube

Learn how to master the basics of RAG in this easy to follow step-by-step series of tutorials

favicon youtube.com

Β 

Moving to the End...

Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response.

understood

If you still have any questions, drop it in the comment section. Alternatively, you can join the LLMWare Official Discord Channel by following this link: https://discord.com/invite/fCztJQeV7J

Thank you! You're the most beautiful person! Keep learning, keep hustling. Have a good day!! πŸ’

Star LLMWare.ai ⭐

Top comments (26)

Collapse
 
rohan_sharma profile image
Rohan Sharma

Share your thoughts and doubts here.

Also, don't forget to Star the awesome LLMWare repo

Collapse
 
best_codes profile image
Best Codes

Nice article! I've written my own codes for this, running models locally. I used the nomic-embed-text-v1.5 model, found here:
huggingface.co/nomic-ai/nomic-embe...

I wrote a Python script where my folder was indexed (converted to a text embedding vector database by the model), then GPT4o (or a locally running model) could use tool calling to input something specific and get relevant parts of the output. For large folders; it was a bit slow sometimes, but it worked great!

Basically, the point was to let an AI chat model be able to summarize gigantic files or entire folders on my computer for me.

I think I'm going to open source my project soon, since I used all open-source models (GPT4o is optional) to create it.

Collapse
 
rohan_sharma profile image
Rohan Sharma

That's so great... The downloads are 553,239. You're really amazing. I suggest you join the llmware discord, you'll get a lot of great stuff there! The power of llmware is more based on the SSM's (small specialized models), you can read the documentation or Intro to llmware for more details!

Also, Making your project OS is great thinking if you're thinking to maximize its extent!

Collapse
 
best_codes profile image
Best Codes

Oh, just to be clear β€” I did not create that model, I only used it! πŸ˜…

I've tried a few things like LLMWare, but I usually prefer just to make my own thing, so I know how everything works. Of course, I use libraries for lots of my AI things, but mostly just the Hugging Face transformers library and a couple others.

Thread Thread
 
rohan_sharma profile image
Rohan Sharma

Oh, just to be clear β€” I did not create that model, I only used it! πŸ˜…
Sorry, I misunderstood it! Nevermind, you have the capability to build one.

LLMWare is too on hugging face though πŸ˜‰

Thread Thread
 
best_codes profile image
Best Codes

I'll check it out if I come across it :D

Thread Thread
 
rohan_sharma profile image
Rohan Sharma

Great!

Collapse
 
meyerluanna profile image
Luanna Meyer

Yes, I’ve tried AI-powered medical virtual assistants, and the impact has been incredible! Virtual assistants like Kodexia can automate everything from appointment scheduling to patient inquiries, saving time and reducing administrative burden. Kodexia adapts quickly to different medical environments, offering personalized responses while maintaining data security. It’s been a game changer for clinics aiming to boost efficiency and provide real-time support without sacrificing patient care quality. Definitely worth trying if you're looking to enhance operational efficiency!

Collapse
 
rohan_sharma profile image
Rohan Sharma

That looks Cool! Will try it in the future πŸ˜‰

Collapse
 
meyerluanna profile image
Luanna Meyer

Thanks, Rohan! You should definitely try out Kodexia’s AI-powered chatbot in the future. It's a game-changer for streamlining customer support, automating responses, and improving overall efficiency. Let me know if you need more details on how it could help your business!

Thread Thread
 
rohan_sharma profile image
Rohan Sharma

Hey Luaunna, Thank you! I was finding the docs of Kodexia, but unable to get them. Could you please share the link, if possible? Thank you once again for bringing this up!

Thread Thread
 
meyerluanna profile image
Luanna Meyer

kodexia.ai/
Here is the link of Kodexia, AI-powered Chatbot.
There are 2 buttons first one is "Get Your Free Chatbot".
Click that button and fill the form and get your free access from the Company.

Thread Thread
 
rohan_sharma profile image
Rohan Sharma

Oh okay!! Cool

Collapse
 
arnavk-09 profile image
ArnavK-09

dam dam bro great

Collapse
 
rohan_sharma profile image
Rohan Sharma

Haha.. Thanks

Collapse
 
jennie_py profile image
Priya Yadav

βœ¨βœ¨πŸ’―πŸ’―

Collapse
 
rohan_sharma profile image
Rohan Sharma

πŸ˜‰

Collapse
 
niharikaa profile image
Niharika Goulikar

Nice article!

Collapse
 
rohan_sharma profile image
Rohan Sharma

Thanks for reading!

Collapse
 
svamshi profile image
s-vamshi

Great article and consistency tooπŸ˜‰

Collapse
 
rohan_sharma profile image
Rohan Sharma • Edited

Thanks for the read Vamshi

Collapse
 
harshika_982e868132d9ddba profile image
Harshika

Nice information

Collapse
 
rohan_sharma profile image
Rohan Sharma

I'm glad that you liked it!

Collapse
 
suraj_kumar_79ebbb6e3724f profile image
SuRaj KuMar • Edited

Very informative.....!!!!

Collapse
 
rohan_sharma profile image
Rohan Sharma

You welcome.. Thanks for reading!

Collapse
 
winzod4ai profile image
Winzod AI

Hey folks, came across this post and thought it might be helpful for you! Check out this comprehensive guide to RAG in AI - Rag In AI.