bitnet.cpp
by Various
Official inference framework for 1-bit LLMs
Apps
bitnet.cpp
Added 1 June 2026
Overview
Inference framework for 1-bit large language models that quantize weights to single bits, reducing model size and memory requirements. Runs on standard hardware with minimal computational overhead compared to full-precision models. Designed for deployment scenarios where model footprint and latency matter.
Best for
Best for
Developers building inference systems for edge devices, mobile applications, or cost-sensitive deployments where model size and speed outweigh maximum accuracy.
Use cases
- Running LLMs on edge devices and resource-constrained environments
- Reducing inference latency for real-time applications
- Lowering memory and storage costs for model deployment
Notes
Inference framework for 1-bit large language models that quantize weights to single bits, reducing model size and memory requirements. Runs on standard hardware with minimal computational overhead compared to full-precision models. Designed for deployment scenarios where model footprint and latency matter.
39,132 stars on GitHub. Last updated 2026-03-10. Licensed MIT.
Use cases
- Running LLMs on edge devices and resource-constrained environments
- Reducing inference latency for real-time applications
- Lowering memory and storage costs for model deployment
Pros
- Extreme model compression enables deployment on devices that cannot run standard LLMs
- Significantly faster inference due to reduced memory bandwidth and compute requirements
- Open source with substantial community adoption (39k+ stars)
Cons
- 1-bit quantization introduces accuracy loss compared to full-precision models
- Limited to Python ecosystem, not language-agnostic
- Requires models specifically trained or converted for 1-bit format, not compatible with arbitrary LLMs
Indexed from awesome-generative-ai and enriched against its public facts.
Pros
- Extreme model compression enables deployment on devices that cannot run standard LLMs
- Significantly faster inference due to reduced memory bandwidth and compute requirements
- Open source with substantial community adoption (39k+ stars)
Cons
- 1-bit quantization introduces accuracy loss compared to full-precision models
- Limited to Python ecosystem, not language-agnostic
- Requires models specifically trained or converted for 1-bit format, not compatible with arbitrary LLMs
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.