DEV Community

# observability

Gaining deep insights into system behavior through metrics, logs, and traces.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Monitoring Costs Are Out of Control — Here's How to Fix It

Monitoring Costs Are Out of Control — Here's How to Fix It

Comments
2 min read
A Review of 100 Public AI Incident Postmortems. Here Are the 6 Mistakes That Keep Showing Up.

A Review of 100 Public AI Incident Postmortems. Here Are the 6 Mistakes That Keep Showing Up.

Comments
9 min read
Debugging an LLM Bug at 3 AM: The Runbook I Wish I'd Had

Debugging an LLM Bug at 3 AM: The Runbook I Wish I'd Had

Comments
9 min read
The 7 Most Expensive LLM Production Incidents of 2025–2026 (Each One Had a Fixable Signal Nobody Watched)

The 7 Most Expensive LLM Production Incidents of 2025–2026 (Each One Had a Fixable Signal Nobody Watched)

Comments
11 min read
Structure-Driven Engineering Organization Theory #8 — Conditions for a Structure-Driven Organization

Structure-Driven Engineering Organization Theory #8 — Conditions for a Structure-Driven Organization

Comments
9 min read
Claude Went Down Twice in 48 Hours Last Week. If You Noticed, Your Fallback Failed.

Claude Went Down Twice in 48 Hours Last Week. If You Noticed, Your Fallback Failed.

Comments
8 min read
The Production Readiness Checklist for LLM Apps Nobody Gave You (18 Items)

The Production Readiness Checklist for LLM Apps Nobody Gave You (18 Items)

Comments
5 min read
The 5 RAG Failure Modes Nobody Talks About (and How to Detect Them Before Users Do)

The 5 RAG Failure Modes Nobody Talks About (and How to Detect Them Before Users Do)

Comments
8 min read
8 Ways Your LLM App Is Silently Failing Right Now (and What to Instrument for Each)

8 Ways Your LLM App Is Silently Failing Right Now (and What to Instrument for Each)

Comments
6 min read
OpenTelemetry GenAI Semantic Conventions: Your LLM Traces Should Look Like This in 2026

OpenTelemetry GenAI Semantic Conventions: Your LLM Traces Should Look Like This in 2026

Comments
5 min read
LLM-as-Judge: The Eval Technique That Looks Cheap Until It Grades Its Own Bias Back to You

LLM-as-Judge: The Eval Technique That Looks Cheap Until It Grades Its Own Bias Back to You

Comments
7 min read
Datadog Sees the HTTP 200. It Cannot See the Hallucination.

Datadog Sees the HTTP 200. It Cannot See the Hallucination.

Comments
4 min read
Your LLM Gateway Is a Blind Spot. Here's How to Instrument It After the LiteLLM Incident.

Your LLM Gateway Is a Blind Spot. Here's How to Instrument It After the LiteLLM Incident.

Comments
5 min read
Langfuse vs LangSmith vs Phoenix vs Braintrust: The Honest 2026 Comparison

Langfuse vs LangSmith vs Phoenix vs Braintrust: The Honest 2026 Comparison

Comments
5 min read
The Failure Mode Your Observability Stack Cannot See

The Failure Mode Your Observability Stack Cannot See

Comments
6 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.