KubeAI
by Community
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
OSS
KubeAI
Added 1 June 2026
Overview
KubeAI is an open-source Kubernetes operator that deploys and serves ML models including VLMs, LLMs, embeddings, and speech-to-text. It automates model serving on Kubernetes clusters using a custom resource definition and handles scaling, resource allocation, and inference requests.
Best for
Best for
Teams already running Kubernetes who want a straightforward way to serve multiple model types in production.
Use cases
- Deploy and serve large language models on existing Kubernetes infrastructure
- Run embedding models for vector search pipelines in production
- Serve speech-to-text models alongside other AI workloads in a unified cluster
Notes
KubeAI is an open-source Kubernetes operator that deploys and serves ML models including VLMs, LLMs, embeddings, and speech-to-text. It automates model serving on Kubernetes clusters using a custom resource definition and handles scaling, resource allocation, and inference requests.
1,201 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.
Use cases
- Deploy and serve large language models on existing Kubernetes infrastructure
- Run embedding models for vector search pipelines in production
- Serve speech-to-text models alongside other AI workloads in a unified cluster
Pros
- Simplifies ML model deployment with native Kubernetes integration
- Supports a wide range of model types from a single operator
- Active open-source community with over 1,200 GitHub stars
Cons
- Requires existing Kubernetes expertise and cluster management
- Limited to models that fit the operator’s supported formats
- Community-driven project may have slower feature updates than commercial alternatives
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Simplifies ML model deployment with native Kubernetes integration
- Supports a wide range of model types from a single operator
- Active open-source community with over 1,200 GitHub stars
Cons
- Requires existing Kubernetes expertise and cluster management
- Limited to models that fit the operator's supported formats
- Community-driven project may have slower feature updates than commercial alternatives
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Kubeflow
Community
Machine Learning Toolkit for Kubernetes
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs
Argo Workflows
Community
Workflow Engine for Kubernetes