Lina Lam

Posted on Jul 31

What is LLM Observability and Monitoring?

#llm #llmobservability #rag #beginners

Building with LLMs in production (well) is incredibly difficult. You probably have heard of the word LLM Observability. But what is it? How does it differ from traditional observability? What is being observed? We have the answers.

The TL;DR

LLM Observability is complete visibility into every layer of an LLM-based software system - the application, the prompt, and the response. LLM Observability comes hand-in-hand with LLM Monitoring. While monitoring tracks application performance metrics, observability is more investigative.

	LLM Observability	LLM Monitoring
Purpose	Event logging	Collect metrics
Key Aspects	Trace the flow of requests to understand system dependencies and interactions	Track application performance metrics, such as usage, cost, latency, error rates
Example	Correlate different types of data to understand issues and complex behaviours	Set up thresholds for unexpected behaviors

What's the difference between LLM vs. Traditional Observability?

Traditional development is typically transactional. Developers observe how the application handles HTTP requests/responses, a database query, or published message. In contrast, LLMs are much more complex.

Here's a comparison of the logs:

Traditional	LLMs
Simple, isolated interactions	Indefinitely nested interactions, creating a complex tree structure
Clear start and end points	Encompass multiple interactions
Small body size (low KBs of data)	Massive payloads (potentially GBs)
Predictable behavior (easy to evaluate)	Lack of predictability (difficult to evaluate)
Primarily text-based logs and numerical metrics	Multi-modal data (text, image, audio, video)

Issues with LLMs

Hallucination: LLMs' objective is to predict the next few characters and not accuracy. This means that responses are not grounded in facts.

Complex use cases: LLM-based software systems require an increasing number of LLM calls to execute a complex task (i.e. agentic workflow). Reflexion is a technique engineers use to get LLMs to analyze their own results. But this consists of having multiple calls inside of multiple spans for checking hallucinations.

Proprietary data: Managing proprietary data is tricky. You need it to answer specific customer questions, but it can accidentally find its way into the responses.

Quality of response: Is the response in the wrong tone? Is the amount of detail appropriate for your users' ask?

Cost (the big elephant in the room) - As usage goes up, and your LLM setup becomes more complicated (i.e. adding Reflexion), the cost can easily add up.

Third-party models: Their API can change, new models and new guardrails can be added, causing your LLM app to behave differently than before.

Limited competitive advantage: LLMs are hard to train and maintain. Chances are that you are using the same model as your competitor. Your differentiator becomes your prompt engineering and proprietary data.

What LLM Observability Tools Have In Common

Developers working on LLM applications need effective tools to understand and address bugs, and exceptions, and prevent regressions. They require unique visibility into the functioning of these applications, including:

Real-time monitoring of AI models
Detailed error tracking and reporting
Insights into user interactions and feedback
Performance metrics and trend analysis
Multi-metric correlations
Tools for prompt iterations and experimentation

The author

Aparna Dhinakaran is the Co-Founder and Chief Product Officer at Arize AI, a leader in machine learning observability. She is recognized in Forbes 30 Under 30 and led ML engineering at Uber, Apple, and TubeMogul (Adobe).

What we've learned

At Helicone AI, we've seen the complexities of productizing LLMs first-hand. Effective observability is key to navigating these challenges, and we strive to help our customers produce reliable and high-quality LLM applications, making the observability process easier and faster.

What are your thoughts?

DEV Community

What is LLM Observability and Monitoring?

The TL;DR

What's the difference between LLM vs. Traditional Observability?

Issues with LLMs

What LLM Observability Tools Have In Common

Further reading

The author

What we've learned

Top comments (0)

Read next

WebAssembly + JavaScript: Building a Real-Time Image Processing Tool

First step and troubleshooting Docling — RAG with LlamaIndex on my CPU laptop

OpenAI o3 - Thinking Fast and Slow

Getting Started with Golang: A Beginner’s Guide