O Open Source Frameworks medium

LMDeploy

by Community

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Visit Community View repo Submit your build →

OSS

LMDeploy

Added 1 June 2026

#codellama #cuda-kernels #deepspeed #fastertransformer #internlm #llama #llama2 #llama3

Overview

LMDeploy is a toolkit for compressing, deploying, and serving large language models. It provides quantization, efficient inference, and a serving backend to reduce model size and latency.

Best for

Best for
Developers who need to compress and serve LLMs efficiently in production

Use cases

Quantize LLMs to lower precision for faster inference
Deploy and serve LLMs with a high-performance inference engine
Integrate LLMs into production pipelines with minimal overhead

Notes

LMDeploy is a toolkit for compressing, deploying, and serving large language models. It provides quantization, efficient inference, and a serving backend to reduce model size and latency.

7,876 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

Quantize LLMs to lower precision for faster inference
Deploy and serve LLMs with a high-performance inference engine
Integrate LLMs into production pipelines with minimal overhead

Pros

Strong quantization support reduces memory and speeds up inference
High-performance serving backend with low latency
Active community with frequent updates and 7.8k GitHub stars

Cons

Limited to models compatible with its engine and quantization methods
Documentation and examples may lag behind rapid development
Requires Python and some familiarity with model deployment tooling

Indexed from awesome-llm and enriched against its public facts.

Pros

Strong quantization support reduces memory and speeds up inference
High-performance serving backend with low latency
Active community with frequent updates and 7.8k GitHub stars

Cons

Limited to models compatible with its engine and quantization methods
Documentation and examples may lag behind rapid development
Requires Python and some familiarity with model deployment tooling

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

Alternative to2entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 1mo ago

Alternatives2entries

O OSS Framework medium

SGLang

Community

SGLang is a high-performance serving framework for large language models and multimodal models.

★ 28,885 updated 1mo ago

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →