Bhumika Biyani

Posted on Feb 15

Understanding Hallucinations in Large Language Models (LLMs)

#llm #machinelearning #nlp #ai

Ever felt like your chat with an AI assistant took a strange turn. It speaks false knowledge as if it is accurate, leaving you wondering if you were in a simulation.

Picture this:** You’re lost in a digital labyrinth (Bhool Bhulaeeya), where AI models have tons of data and its answers spin around that data and even beyond it, weaving illusions and blurring the boundaries of reality**.

Welcome to the mysterious world of LLM hallucinations.
LLMs sometimes produce content that deviates from facts or contextual logic leading to severe consequences.

For example:
I gave a prompt that: “Just draw a pie chart for me to demonstrate how accurate a large language model is in terms of accuracy.”
and following was the output.

These deviations are known as hallucinations i.e. unexpected, nonsensical, or even fabricated output. They invent things that aren’t true.

Causes of Hallucinations

Data Quality: LLMs are trained on vast and diverse datasets, often compiled from sources like the internet, books, and articles. This diversity introduces a wide range of writing styles, opinions, and information. However, it also brings in noise, errors, biases, and inconsistencies inherent in real-world data. LLMs may encounter situations where the training data is incomplete or inaccurate. If the model generalizes from such data, it can produce hallucinations by making assumptions or extrapolating information that may not be accurate.

For example: A very popular discussion on “should egg be considered a veg product” is present on a famous community platform, now every person has a different view and facts supporting their views. If our model is being trained on the data available on internet, then whenever we ask the same question to the model it will produce different answer which are totally unreliable.

Generation Method:

LLMs use different methods for generating text, such as beam search, sampling, maximum likelihood estimation, or reinforcement learning. These methods have their own biases and trade-offs.
The generation process involves finding a balance between various factors, including fluency and diversity, coherence and creativity, accuracy, and novelty.

For example, using beam search may favor more common words for fluency, but it might sacrifice specificity.

If you don’t know what a beam search is then here is the context:
Beam search is commonly used for autocompleting suggestions. It works by exploring multiple possible continuations of the input text and selecting the one with the highest probability.
For efficiency, it narrows down the search to a “beam” or a set number of top candidates.

In a simple way, if you start typing “How to make a” the system might suggest “How to make a cake,” as “cake” is a common word in that context. If you were looking for a specific type of cake, like “red velvet cake,” beam search might prioritize suggesting the more common “cake” over the specific request.

Input Context:

The context given to the model through input prompts guides its understanding and influences the generated output.
If the input context is ambiguous, inconsistent, or contradictory, the model may struggle to understand the user’s intent. This can result in hallucinations as the model attempts to reconcile conflicting information or makes assumptions based on unclear context.

For example, you ask your model to “Book a flight from New York to Los Angeles for two passengers, first class, departing on June 15th and returning on June 10th.”
The input context in this example contains contradictory information. The return date is set before the departure date, creating confusion for the model.

To prevent it what can we do:

Provide Clear and Specific Prompts: Precision and detail in input prompts enhance the likelihood of accurate and relevant outputs. Provide LLM with multiple examples of the desired output format or context. Helps the model recognize patterns or context more effectively, reducing hallucinations.
Use Active Mitigation Strategies: Adjust settings controlling LLM parameters, such as the temperature parameter, to control randomness. Lower temperature for more conservative responses, higher temperature for more creative ones. Like adjusting the seasoning in a recipe — too much or too little affects the taste. 3.** Fine-Tuning:** Specialized training on specific domains improves accuracy. It’s like tuning an instrument for better performance.

Here is something our one of the favorite LLM model “Gemini” has to say from his side on the topic:

Hallucinate: A wrong term to use for LLM

Though this term “Hallucination” by LLM is commonly used when people talk about how language models sometimes make up things or share incorrect info. Many researchers say that using this term might be a bit misleading. It makes it sound like these models are imagining stuff like humans do, but that’s not exactly the case. The real issue comes from factors like the data they were trained on, biases, and how they predict things based on probability.

So, instead of thinking of it as the model hallucinating, it’s more about understanding these factors that can lead to inaccurate outputs. It’s like the model making an educated guess that might not always be right, rather than having vivid, imaginative experiences.

So, we’ve delved into the curious case of LLM hallucinations — fascinating, yes? but also a cause for raised eyebrows.

As much as I can explore this topic, the real magic lies in the collective view of different perspectives. So, I throw the mic over to you! What are your thoughts on LLM hallucinations? Do they worry you, or do you find them strangely exciting?

Share your questions, your concerns, your wildest theories! Let’s have a conversation that goes beyond ones and zeros.

After all, understanding these models means understanding ourselves a little better, don’t you think?

DEV Community

Understanding Hallucinations in Large Language Models (LLMs)

Top comments (0)

Read next

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Confidential Federated Computations

AI+Node.js x-crawl crawler: Why are traditional crawlers no longer the first choice for data crawling?

AI Interview Simulator