O Open Source Observability medium

OpenModelZ

by Community

Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)

Visit Community View repo Submit your build →

OSS

OpenModelZ

Added 1 June 2026

#cluster-manager #hacktoberfest #inference #llm #llmops #mlops

Overview

OpenModelZ is an open-source tool that autoscales LLM inference deployments (vLLM, SGLang, LMDeploy) on Kubernetes and other platforms. It monitors inference workloads and adjusts resources to maintain performance while minimizing over-provisioning.

Best for

Best for
Teams operating LLM inference services on Kubernetes who need workload-driven autoscaling.

Use cases

Automatically scale vLLM deployments based on request load
Optimize GPU utilization for SGLang inference servers
Manage dynamic inference capacity for LMDeploy on Kubernetes

Notes

283 stars on GitHub. Last updated 2023-11-03. Licensed Apache-2.0.

Use cases

Automatically scale vLLM deployments based on request load
Optimize GPU utilization for SGLang inference servers
Manage dynamic inference capacity for LMDeploy on Kubernetes

Pros

Open source with a permissive license (community-driven)
Written in Go for efficient resource usage and fast startup
Supports multiple popular LLM serving frameworks

Cons

Relatively small community (283 stars) may limit support and contributions
Requires Kubernetes knowledge to deploy and configure
May not handle complex, multi-model inference scenarios out of the box

Indexed from awesome-llmops and enriched against its public facts.

Pros

Open source with a permissive license (community-driven)
Written in Go for efficient resource usage and fast startup
Supports multiple popular LLM serving frameworks

Cons

Relatively small community (283 stars) may limit support and contributions
Requires Kubernetes knowledge to deploy and configure
May not handle complex, multi-model inference scenarios out of the box

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses2entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

O OSS Framework medium

SGLang

Community

SGLang is a high-performance serving framework for large language models and multimodal models.

★ 28,885 updated 1mo ago

Pairs with1entry

O OSS Obs medium

LiteLLM 🚅

Community

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, Vertex

★ 48,950 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →