LMDeploy
by Community
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
OSS
LMDeploy
Added 1 June 2026
Overview
LMDeploy is a toolkit for compressing, deploying, and serving large language models. It provides quantization, efficient inference, and a serving backend to reduce model size and latency.
Best for
Best for
Developers who need to compress and serve LLMs efficiently in production
Use cases
- Quantize LLMs to lower precision for faster inference
- Deploy and serve LLMs with a high-performance inference engine
- Integrate LLMs into production pipelines with minimal overhead
Notes
LMDeploy is a toolkit for compressing, deploying, and serving large language models. It provides quantization, efficient inference, and a serving backend to reduce model size and latency.
7,876 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.
Use cases
- Quantize LLMs to lower precision for faster inference
- Deploy and serve LLMs with a high-performance inference engine
- Integrate LLMs into production pipelines with minimal overhead
Pros
- Strong quantization support reduces memory and speeds up inference
- High-performance serving backend with low latency
- Active community with frequent updates and 7.8k GitHub stars
Cons
- Limited to models compatible with its engine and quantization methods
- Documentation and examples may lag behind rapid development
- Requires Python and some familiarity with model deployment tooling
Indexed from awesome-llm and enriched against its public facts.
Pros
- Strong quantization support reduces memory and speeds up inference
- High-performance serving backend with low latency
- Active community with frequent updates and 7.8k GitHub stars
Cons
- Limited to models compatible with its engine and quantization methods
- Documentation and examples may lag behind rapid development
- Requires Python and some familiarity with model deployment tooling
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs
TensorRT-LLM
Community
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV