I recently came across RAG (Retrieval Augmented Generation), at first it looked like a very fancy tech jargon but after learning more about it I found it to be a very interesting and clever technique to supplement an LLM in order to perform well on your custom data without hallucinating and going through the process of fine-tuning the model.
Hallucination is a behavior exhibited by Large Language Models where the model generates a response that is incorrect and the model makes up that information. You might have observed this with chat-gpt when asking for very specific details about some library it might end up making up documentation and GitHub links for specific ways of doing things that are not supported by the library. For instance, I was searching on Google on how to extend the express request object type in typescript and it told me about some made-up
extend method that it didn't even use
There are a couple of techniques to prevent hallucinations when dealing with LLMs, we will be looking into two specific techniques for the sake of this post.
- Fine Tuning
- Retrieval Augmented Generation (RAG)
When fine-tuning the model you basically use the pre-trained model and then re-train additional layers or the last layers at the end of the model on the target data, keeping the model's pre-trained weights so the model's general understanding could be fine-tuned to the custom data set. However, this is computationally more expensive, time-consuming, and requires more expertise in the field of ML.
- Select a pre-trained model
- Prepare your custom data
- Tune hyperparameters
- Fine-tune model using transfer learning
- Evaluate the model
RAG takes a different approach to augment the generative model output by supplementing the process with a retrieval system, since LLMs are already trained on a huge corpus of text they already have a very well understanding of the contextual and semantic meaning of the words, what they lack is the additional information from a custom knowledge base in order to come up with an educated response.
This is where RAG comes in and augments the LLM to prevent hallucinations by providing it with essential information in order to answer the query. Instead of passing the user query directly to the LLM it first goes through a retrieval system to fetch the relevant documents based on the user query which are then augmented on top of the user’s prompt and passed to the LLM, This makes it much more dynamic and flexible to provide external information to the model that it wasn’t aware of before in order to produce a much better response.
This approach also has the benefit of updating your repository of documents without having to retrain the entire model as in the case of fine tuning. You can also employ modern semantic search techniques and Vector DB to augment the retrieval system to get the best results from your knowledge base that can then be passed to the LLM. You can see the high-level overview of a RAG system in the below diagram
If you have any questions or feedback, feel free to reach out!