DEV Community

Cover image for Introducing KubeAI: Open AI Inference Operator
Sam Stoelinga
Sam Stoelinga

Posted on

Introducing KubeAI: Open AI Inference Operator

We recently launched KubeAI. The goal of KubeAI is to get LLMs, embedding models and Speech to text running on Kubernetes with ease.

KubeAI provides an OpenAI compatible API endpoint which makes it work out of the box with most software that works with the OpenAI APIs.

Repo on GitHub: substratusai/kubeai

Image description

When it comes to LLMs, KubeAI directly operates vLLM and Ollama servers in isolated Pods, configured and optimized on a model-by-model basis. You get metrics-based auto scaling out of the box (including scale-from-zero). When you hear scale-from-zero in Kubernetes-land you probably think Knative and Istio - but not in KubeAI! We made an early design decision to avoid any external dependencies (Kubernetes is complicated enough as-is).

We are hoping to release more functionality soon. Next up: model caching, metrics and dashboard.

If you need any help or have any feedback, reach out directly, here, or via the channels listed in the repo. We are currently making it our priority to assist the project’s early adopters. So far users have seen success in use cases ranging from processing large scale batches in the cloud to running lightweight inference at the edge.

Top comments (1)

Collapse
 
nstogner profile image
Nick Stogner

Co-author here, happy to answer any questions!