Open Source Alternatives
Open source alternatives to vLLM
Open source alternatives to vLLM, ranked by GitHub stars and freshness.
13 open-source alternatives in the index, ranked by GitHub stars and freshness.
ollama
Community
Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
Best for: Developers building local-first applications or prototyping with open-source LLMs without cloud costs
llama.cpp
Community
LLM inference in C/C++
Best for: Developers building privacy-first or offline-capable applications with constrained hardware
SGLang
Community
SGLang is a high-performance serving framework for large language models and multimodal models.
Best for: Teams building production LLM services who need performance-optimized serving infrastructure
TensorRT-LLM
Community
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV
Best for: Teams deploying LLMs at scale on NVIDIA infrastructure who need maximum inference performance.
text-generation-inference
Community
Large Language Model Text Generation Inference
Best for: Developers needing a production-grade, self-hosted LLM serving solution.
Triton Server (TRTIS)
Community
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Best for: Teams deploying large-scale inference services that need high throughput and multi-framework support.
FlexGen
Community
Running large language models on a single GPU for throughput-oriented scenarios.
Best for: Developers who need to run large language models at high throughput on a single GPU, especially in budget-constrained or research environments
LMDeploy
Community
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Best for: Developers who need to compress and serve LLMs efficiently in production
mistral.rs
Community
Fast, flexible LLM inference
Best for: Rust developers seeking a fast, flexible LLM inference framework for performance-critical or resource-constrained environments.
FasterTransformer
Community
Transformer related optimization, including BERT, GPT
Best for: Developers seeking maximum inference performance for transformer models on NVIDIA hardware
ray-llm
Community
RayLLM - LLMs on Ray (Archived). Read README for more info.
Best for: Developers already using Ray who need legacy code or patterns for running LLMs at scale.
IntelliServer
Community
AI models as scalable microservices, enabling evaluation of LLMs and offering end-to-end functions such as chatbot, semantic search, image generation and beyond.
Best for: JavaScript developers who need a simple microservice wrapper for deploying and evaluating AI models
TGI
Community
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Best for: Developers and teams who need to self-host or fine-tune open-source LLMs at scale