O Open Source Observability medium

LLMKube

by Community

Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready

Visit Community View repo Submit your build →

OSS

LLMKube

Added 1 June 2026

#ai #apple-silicon #autoscaling #edge-computing #gguf #gpu #homelab #inference

Overview

LLMKube is a Kubernetes operator for running LLM inference workloads locally using llama.cpp, vLLM, TGI, and mlx-server. It supports multi-GPU configurations on NVIDIA and Apple Silicon Metal, provides autoscaling, and can operate in air-gapped environments.

Best for

Best for
Teams needing a Kubernetes-native way to self-host LLM inference with flexible GPU support

Use cases

Deploy and scale local LLM inference on a private Kubernetes cluster
Run production LLM workloads with multiple GPU types (NVIDIA and Apple Silicon)
Manage LLM serving in air-gapped or restricted-network environments

Notes

118 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

Deploy and scale local LLM inference on a private Kubernetes cluster
Run production LLM workloads with multiple GPU types (NVIDIA and Apple Silicon)
Manage LLM serving in air-gapped or restricted-network environments

Pros

Supports multiple inference engines (llama.cpp, vLLM, TGI, mlx-server)
Works with both NVIDIA and Apple Silicon Metal GPUs
Designed for air-gapped, production-ready deployment

Cons

Community project with only 118 stars
Written in Go, limiting contributor base
Requires Kubernetes expertise to operate

Indexed from awesome-llmops and enriched against its public facts.

Pros

Supports multiple inference engines (llama.cpp, vLLM, TGI, mlx-server)
Works with both NVIDIA and Apple Silicon Metal GPUs
Designed for air-gapped, production-ready deployment

Cons

Community project with only 118 stars
Written in Go, limiting contributor base
Requires Kubernetes expertise to operate

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses2entries

O OSS Framework medium

llama.cpp

Community

LLM inference in C/C++

★ 114,160 updated 1mo ago

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →