Enterprise DNA
P Apps and SaaS Productivity low

bitnet.cpp

by Various

Official inference framework for 1-bit LLMs

B

Apps

bitnet.cpp

Added 1 June 2026

Overview

Inference framework for 1-bit large language models that quantize weights to single bits, reducing model size and memory requirements. Runs on standard hardware with minimal computational overhead compared to full-precision models. Designed for deployment scenarios where model footprint and latency matter.

Best for

Best for
Developers building inference systems for edge devices, mobile applications, or cost-sensitive deployments where model size and speed outweigh maximum accuracy.

Use cases

  • Running LLMs on edge devices and resource-constrained environments
  • Reducing inference latency for real-time applications
  • Lowering memory and storage costs for model deployment

Notes

Inference framework for 1-bit large language models that quantize weights to single bits, reducing model size and memory requirements. Runs on standard hardware with minimal computational overhead compared to full-precision models. Designed for deployment scenarios where model footprint and latency matter.

39,132 stars on GitHub. Last updated 2026-03-10. Licensed MIT.

Use cases

  • Running LLMs on edge devices and resource-constrained environments
  • Reducing inference latency for real-time applications
  • Lowering memory and storage costs for model deployment

Pros

  • Extreme model compression enables deployment on devices that cannot run standard LLMs
  • Significantly faster inference due to reduced memory bandwidth and compute requirements
  • Open source with substantial community adoption (39k+ stars)

Cons

  • 1-bit quantization introduces accuracy loss compared to full-precision models
  • Limited to Python ecosystem, not language-agnostic
  • Requires models specifically trained or converted for 1-bit format, not compatible with arbitrary LLMs

Indexed from awesome-generative-ai and enriched against its public facts.

Pros

  • Extreme model compression enables deployment on devices that cannot run standard LLMs
  • Significantly faster inference due to reduced memory bandwidth and compute requirements
  • Open source with substantial community adoption (39k+ stars)

Cons

  • 1-bit quantization introduces accuracy loss compared to full-precision models
  • Limited to Python ecosystem, not language-agnostic
  • Requires models specifically trained or converted for 1-bit format, not compatible with arbitrary LLMs

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.