DEV Community

Cover image for Athina AI: Monitor & Evaluate LLM Outputs in 5Mins!
Himanshu Bamoria
Himanshu Bamoria

Posted on

Athina AI: Monitor & Evaluate LLM Outputs in 5Mins!

TL;DR: Athina helps you monitor and evaluate your LLM powered app. Plug and play evals in production. 5 minute setup.


๐Ÿ‘‹ Hey everyone! Weโ€™re thrilled to announce the launch of Athina AI, a suite of tools for LLM developers to ship and develop AI products with confidence.

What is Athina AI?

Athina Monitoring Dashboard

Athina AI is a Monitoring & Evaluation platform for LLM developers.

Developers use Athinaโ€™s evaluation framework and production monitoring platform to improve the performance and reliability of AI applications through real-time monitoring, analytics, and automatic evaluations.

Problem

  • It is difficult to measure the quality of Generative AI responses.
  • Eyeballing production responses is tough.
  • No easy way to detect unreliable or bad outputs (especially in production).
  • Low visibility into LLM touchpoints.

LLM developers typically have to build lots of in-house infrastructure for monitoring and evaluation.

Solution: Athina AI

  • Quick Setup: Get started in just 5 minutes! The entire integration is 1 simple POST request (and we donโ€™t interfere with your LLM calls)
  • Comprehensive Monitoring Platform: Full visibility into your LLM touchpoints. Search, sort, filter, compare, debug.
  • Prebuilt Evaluations:
    • You can configure automatic evaluations in just a few clicks - use one of our preset evals or define a custom eval.
    • These evals will run against logged inferences automatically.
    • You can also use our open-source library to run evals and iterate rapidly during development.
  • Granular Analytics:
    • Tracks usage metrics like response time, cost, token usage, feedback, and more.
    • Athina also track metrics from the evals, like Faithfulness, Answer Relevance, Context Sufficiency, etc
    • You can segment these metrics by any property: customer ID, environment, model, prompt, etc.
      • For example, you could use Athina to see how prompt/v4 is performing for customer ID nike-usa and how gpt-4 performance compares to a llama finetune.

Athina Evaluation Dashboard

Our Story

As a team of engineers and hackers, we spent a summer trying to build various LLM-powered applications for developers.

While working with LLMs, we found that the most challenging part was evaluating the Generative AI output and systematically improving model performance.

We discovered a major gap in the tools that engineers need to effectively build production grade applications using LLMs, and set out to solve this problem.

Get Started

Athina AI is a comprehensive suite of tools to supercharge your LLM development lifecycle and help you ship high-performing, reliable AI applications with confidence.

Top comments (2)

Collapse
 
ai_dan21 profile image
Danny

Is this open source?

Collapse
 
hbamoria profile image
Himanshu Bamoria

Hi @ai_dan21
Yes our evaluators are open-source. You can have a look here - github.com/athina-ai/athina-evals

Would love to know your thoughts.