Enterprise DNA
O Open Source Frameworks medium

vLLM

by Community

A high-throughput and memory-efficient inference and serving engine for LLMs

V

OSS

vLLM

Added 1 June 2026

#amd #blackwell #cuda #deepseek #deepseek-v3 #gpt #gpt-oss #inference

Overview

vLLM is a Python framework for serving large language models with optimized throughput and memory efficiency. It uses techniques like paged attention and continuous batching to reduce latency and increase request throughput compared to standard inference servers. Designed for production deployments that need to handle multiple concurrent requests.

Best for

Best for
Teams building production LLM APIs and services that need to maximize throughput and minimize latency under concurrent load.

Use cases

  • Running inference servers that handle high request volume with low latency
  • Reducing GPU memory footprint when serving large models
  • Batching and scheduling inference requests efficiently

Notes

vLLM is a Python framework for serving large language models with optimized throughput and memory efficiency. It uses techniques like paged attention and continuous batching to reduce latency and increase request throughput compared to standard inference servers. Designed for production deployments that need to handle multiple concurrent requests.

81,619 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

  • Running inference servers that handle high request volume with low latency
  • Reducing GPU memory footprint when serving large models
  • Batching and scheduling inference requests efficiently

Pros

  • Significantly higher throughput than standard LLM serving approaches
  • Lower memory consumption enables serving larger models on same hardware
  • Active community with 81k+ GitHub stars and ongoing development

Cons

  • Requires Python and GPU infrastructure, not suitable for CPU-only deployments
  • Steeper learning curve than simple inference libraries for basic use cases
  • Performance gains depend on workload characteristics and batch patterns

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Significantly higher throughput than standard LLM serving approaches
  • Lower memory consumption enables serving larger models on same hardware
  • Active community with 81k+ GitHub stars and ongoing development

Cons

  • Requires Python and GPU infrastructure, not suitable for CPU-only deployments
  • Steeper learning curve than simple inference libraries for basic use cases
  • Performance gains depend on workload characteristics and batch patterns

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Used by17entries
O OSS Framework medium

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Community

BigScience

O OSS Obs medium

distilabel

Community

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

★ 3,233 updated 9d ago
O OSS Framework medium

GPUStack

Community

A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.

★ 5,082 updated 2d ago
O OSS Framework medium

LangChain

Community

The agent engineering platform.

★ 138,234 updated 2d ago
O OSS Framework medium

lighteval

Community

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

★ 2,430 updated 5d ago
O OSS Obs medium

LLMKube

Community

Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready

★ 118 updated 2d ago
O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 23d ago
O OSS Obs medium

OpenModelZ

Community

Autoscale LLM (vLLM, SGLang, LMDeploy) inferences on Kubernetes (and others)

★ 283 updated 2y ago
O OSS Framework medium

OpenLLM

Community

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

★ 12,346 updated 2d ago
O OSS Framework medium

OpenRLHF

Community

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

★ 9,583 updated 6d ago
O OSS Framework medium

Outlines

Community

Structured Outputs

★ 13,914 updated 16d ago
O OSS Framework medium

Qwen2-Math-1.5B|7B|72B

Community

GITHUB HUGGING FACE MODELSCOPE DISCORD 🚨 This model mainly supports English. We will release bilingual (English and Chinese) math models soon. Introduction Over the past year, w

O OSS Framework medium

Tune Studio

Community

Playground for devs to finetune & deploy LLMs

P Apps Productivity low

DeepSeek

Various

Org profile for DeepSeek on Hugging Face, the AI community building the future.

P Apps Productivity low

Forefront

Various

Forefront is a platform to fine-tune and inference open-source-language-models.

P Apps Productivity low

Mistral

Various

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

P Apps Productivity low

Vicuna-13B

Various

We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge s

Pairs with64entries
M MCP Dev low

Jwrede/llmprobe

Various

Synthetic monitoring and CI smoke tests for LLM inference endpoints.

★ 1 updated 18d ago
O OSS Framework medium

Awesome-LLM-Compression

Community

Awesome LLM compression research papers and tools.

★ 1,840 updated 3mo ago
O OSS Framework medium

Awesome-LLM-Inference

Community

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

★ 16 updated 1y ago
O OSS Framework medium

awesome-llm-webapps

Community

A collection of open source, actively maintained web apps for LLM applications

★ 714 updated 11mo ago
O OSS Framework medium

Axolotl

Community

Go ahead and axolotl questions

★ 11,997 updated 2d ago
O OSS Framework medium

Baichuan-7|13B

Community

AGI Large Language Models

O OSS Orchestration medium

Bifrost

Community

Fastest enterprise AI gateway (50x faster than LiteLLM) with adaptive load balancer, cluster mode, guardrails, 1000+ models support & <100 µs overhead at 5k RPS.

★ 5,406 updated 2d ago
O OSS Framework medium

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Community

BigScience

O OSS Framework medium

CodeQwen1.5-7B

Community

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction The advent of advanced programming tools, which harnesses the power of large language models (LLMs), has significantly en

O OSS Framework medium

Codestral-7|22B

Community

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

O OSS Framework medium

DeepSeek-Math-7B

Community

DeepSeek Math series

O OSS Framework medium

DeepSeek-R1

Community

First-generation reasoning models from DeepSeek.

★ 92,010 updated 11mo ago
O OSS Framework medium

DeepSeek-v2-236B-MoE

Community

We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of whic

O OSS Framework medium

DeepSeek-V2.5

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

DeepSeek-VL-1.3|7B

Community

DeepSeek-VL model series

O OSS Obs medium

Falcon 40B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Obs medium

Fiddler AI

Community

Fiddler Auditor is a tool to evaluate language models.

★ 191 updated 2y ago
O OSS Obs medium

Flyflow

Community

Open source, high performance fine tuning as a service for GPT4 quality models with 5x lower latency and 3x lower cost

O OSS Obs medium

Gemma

Community

Checking your browser - reCAPTCHA

O OSS Framework medium

Gemma2-9|27B

Community

Gemma 2, our next generation of open models, is now available globally for researchers and developers.

O OSS Framework medium

GLM-130B: An Open Bilingual Pre-trained Model

Community

GLM-130B

O OSS Framework medium

GLM-2|6|10|13|70B

Community

Org profile for THUDM on Hugging Face, the AI community building the future.

O OSS Framework medium

Grok-1-314B-MoE

Community

Grok-1-314B-MoE — indexed from awesome-llm

O OSS Framework medium

Haystack

Community

Create agentic, context engineered AI systems using Haystack’s modular and customizable building blocks, built for real-world, production-ready applications.

O OSS Framework medium

Improving language models by retrieving from trillions of tokens

Community

Publications — Google DeepMind

O OSS Framework medium

Infinity

Community

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali

★ 2,817 updated 2mo ago
O OSS Framework medium

InternLM2-1.8|7|20B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Obs medium

KubeAI

Community

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

★ 1,201 updated 2d ago
O OSS Framework medium

Llama 1-7|13|33|65B

Community

[OPT-1.3 6.7 13 30 66B](https://arxiv.org/abs/2205.01068)

O OSS Framework medium

Llama 2: Open Foundation and Fine-Tuned Chat Models

Community

2023-07

O OSS Framework medium

Llama 3.2-1|3|11|90B

Community

[Llama 3.1-8 70 405B](https://llama.meta.com/)

O OSS Framework medium

Llama 3-8|70B

Community

[Llama 2-7 13 70B](https://llama.meta.com/llama2/)

O OSS Framework medium

LLaMA: Open and Efficient Foundation Language Models

Community

2023-02

O OSS Framework medium

maxtext

Community

A simple, performant and scalable Jax LLM!

★ 2,303 updated 2d ago
O OSS Framework medium

Meta Lingua

Community

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

★ 4,760 updated 10mo ago
O OSS Framework medium

MiniCPM-2B

Community

The MiniCPM family of LLMs and VLLMs.

O OSS Framework medium

Mistral 7B

Community

Mistral 7B

O OSS Framework medium

Mixtral-8x7B

Community

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

O OSS Framework medium

Moonlight-A3B

Community

Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer

O OSS Framework medium

MPT-7B

Community

Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available fo

O OSS Framework medium

Nemotron-4-340B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

OLMo-7B

Community

Artifacts for the first set of OLMo models.

O OSS Framework medium

OLMoE: Open Mixture-of-Experts Language Models

Community

We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input

O OSS Framework medium

OLMO-eval

Community

Evaluation suite for LLMs

★ 379 updated 10mo ago
O OSS Framework medium

OpenELM-1.1|3B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

Phi1-1.3B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

Qwen-1.8B|7B|14B|72B

Community

Qwen - a Qwen Collection

O OSS Framework medium

Qwen2-0.5B|1.5B|7B|57B-A14B-MoE|72B

Community

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: Pret

O OSS Framework medium

Qwen2.5-1M-7|14B

Community

Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD Introduction Two months after upgrading Qwen2.5-Turbo to support context length up to one mi

O OSS Framework medium

Qwen2.5 Technical Report

Community

In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been si

O OSS Framework medium

Qwen2.5-Max

Community

QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, th

O OSS Obs medium

ray-llm

Community

RayLLM - LLMs on Ray (Archived). Read README for more info.

★ 1,267 updated 1y ago
O OSS Framework medium

Semantic Kernel

Microsoft

Microsoft's enterprise-flavoured framework for AI agents. .NET-first, with Python and Java siblings.

O OSS Framework medium

SkyPilot

Community

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

★ 10,051 updated 2d ago
O OSS Framework medium

StableLM-3B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

StableLM-v2-12B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

StarCoder-1|3|7B

Community

All models, datasets, and demos related to StarCoder!

O OSS Framework medium

The Llama 3 Herd of Models

Community

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models

O OSS Framework medium

torchtitan

Community

A PyTorch native platform for training generative AI models

★ 5,394 updated 2d ago
O OSS Framework medium

Transformer Engine

Community

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide b

★ 3,374 updated 2d ago
O OSS Framework medium

veRL

Community

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

★ 21,691 updated 2d ago
O OSS Framework medium

Yi-34B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Community

Megatron-LM

P Apps Productivity low

Qwen

Various

Qwickly forging AGI, enhancing intelligence.

Alternatives13entries
O OSS Framework medium

FasterTransformer

Community

Transformer related optimization, including BERT, GPT

★ 6,418 updated 2y ago
O OSS Obs medium

FlexGen

Community

Running large language models on a single GPU for throughput-oriented scenarios.

★ 9,365 updated 1y ago
O OSS Framework medium

IntelliServer

Community

AI models as scalable microservices, enabling evaluation of LLMs and offering end-to-end functions such as chatbot, semantic search, image generation and beyond.

★ 29 updated 1y ago
O OSS Framework medium

llama.cpp

Community

LLM inference in C/C++

★ 114,160 updated 2d ago
O OSS Framework medium

LMDeploy

Community

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

★ 7,876 updated 2d ago
O OSS Framework medium

mistral.rs

Community

Fast, flexible LLM inference

★ 7,205 updated 2d ago
O OSS Framework medium

ollama

Community

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

★ 172,846 updated 2d ago
O OSS Obs medium

ray-llm

Community

RayLLM - LLMs on Ray (Archived). Read README for more info.

★ 1,267 updated 1y ago
O OSS Framework medium

SGLang

Community

SGLang is a high-performance serving framework for large language models and multimodal models.

★ 28,885 updated 2d ago
O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 2d ago
O OSS Obs medium

text-generation-inference

Community

Large Language Model Text Generation Inference

★ 10,857 updated 2mo ago
O OSS Framework medium

TGI

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Obs medium

Triton Server (TRTIS)

Community

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

★ 10,720 updated 2d ago