Enterprise DNA
Directories / Alternatives / llama.cpp

Open Source Alternatives

Open source alternatives to llama.cpp

Open source alternatives to llama.cpp, ranked by GitHub stars and freshness.

10 open-source alternatives in the index, ranked by GitHub stars and freshness.

P Apps Productivity low

bitnet.cpp

Various

Official inference framework for 1-bit LLMs

★ 39,132 updated 2mo ago
freemium

Best for: Developers building inference systems for edge devices, mobile applications, or cost-sensitive deployments where model size and speed outweigh maximum accuracy.

O OSS Framework medium

MNN-LLM

Community

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

★ 15,353 updated 2d ago
open-source

Best for: Developers building production on-device LLM and edge AI applications where latency and resource efficiency are critical.

O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 2d ago
open-source

Best for: Teams deploying LLMs at scale on NVIDIA infrastructure who need maximum inference performance.

O OSS Framework medium

mistral.rs

Community

Fast, flexible LLM inference

★ 7,205 updated 2d ago
open-source

Best for: Rust developers seeking a fast, flexible LLM inference framework for performance-critical or resource-constrained environments.

O OSS Obs medium

Shimmy

Community

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

★ 5,306 updated 2d ago
open-source

Best for: Developers seeking a free, no-fuss Rust-based inference server with OpenAI API compatibility

O OSS Framework medium

exllama

Community

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

★ 2,922 updated 2y ago
open-source

Best for: Developers running quantized Llama models on resource-constrained hardware

O OSS Obs medium

Rapid-MLX

Community

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Dr

★ 2,641 updated 2d ago
open-source

Best for: Developers on Apple Silicon who need a fast, local OpenAI-compatible inference engine for tool-calling and reasoning tasks.

O OSS Framework medium

femtoGPT

Community

Pure Rust implementation of a minimal Generative Pretrained Transformer

★ 934 updated 7mo ago
open-source

Best for: Developers and researchers who want a minimal, understandable GPT implementation in Rust for learning or small-scale experimentation.

P Apps Productivity one click

ChatGPT

OpenAI

General-purpose AI assistant for writing, coding, analysis, and conversation. The most widely deployed consumer AI product.

freemium

Best for: Anyone who wants a versatile AI assistant for daily work and learning

P Apps Productivity low

OpenAI API

Various

Announcement of the OpenAI API for text-to-text general-purpose AI models based on GPT-3. OpenAI blog, June 11, 2020.

freemium

Best for: Developers needing quick integration of general text generation into their applications