A couple days ago, Meta announced Purple Llama and as a first step, released Llama Guard - a safety classifier for input/output filtering. Llama Guard enables classifying text based on unsafe categories (e.g. Violence & Hate, Criminal Planning, etc.)
We're using Llama Guard to classify the user/agent responses from ChatGPT through AIConfig:
AIConfig is a framework that makes it easy to build generative AI applications quickly and reliably in production.
It manages generative AI prompts, models and settings as JSON-serializable configs that you can version control, evaluate, and use in a consistent, model-agnostic SDK.
LLaMA Guard is an LLM-based input-output safeguard model.
This example shows how to use AIConfig to wrap GPT-3.5 calls LLaMA Guard and classify them as safe or unsafe.
Please let us know if you have feedback or questions on AIConfig!
Join our discord: https://discord.com/invite/xBhNKTetGx