Enterprise DNA
P Apps and SaaS Productivity low

llama.cpp

by Various

LLM inference in C/C++

L

Apps

llama.cpp

Added 1 June 2026

#ggml

Overview

llama.cpp runs large language models locally using C/C++ inference optimized for CPU and GPU execution. It enables developers to deploy quantized models with minimal dependencies and memory overhead, making LLM inference practical on consumer hardware.

Best for

Best for
Developers building privacy-first applications or deploying models on resource-constrained devices

Use cases

  • Running open-source models offline without API calls
  • Embedding LLM capabilities into applications with low latency
  • Quantizing and optimizing models for edge deployment

Notes

llama.cpp runs large language models locally using C/C++ inference optimized for CPU and GPU execution. It enables developers to deploy quantized models with minimal dependencies and memory overhead, making LLM inference practical on consumer hardware.

114,160 stars on GitHub. Last updated 2026-06-01. Licensed MIT.

Use cases

  • Running open-source models offline without API calls
  • Embedding LLM capabilities into applications with low latency
  • Quantizing and optimizing models for edge deployment

Pros

  • Extremely efficient inference on CPU and GPU with minimal resource requirements
  • Supports quantized model formats, reducing model size by 4-8x without major quality loss
  • Active community with broad hardware compatibility and regular model support updates

Cons

  • Steeper setup curve than API-based solutions, requires compilation and model management
  • Performance varies significantly based on hardware, CPU inference is substantially slower than GPU
  • Limited to inference only, no built-in fine-tuning or training capabilities

Indexed from awesome-generative-ai and enriched against its public facts.

Pros

  • Extremely efficient inference on CPU and GPU with minimal resource requirements
  • Supports quantized model formats, reducing model size by 4-8x without major quality loss
  • Active community with broad hardware compatibility and regular model support updates

Cons

  • Steeper setup curve than API-based solutions, requires compilation and model management
  • Performance varies significantly based on hardware, CPU inference is substantially slower than GPU
  • Limited to inference only, no built-in fine-tuning or training capabilities

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Used by29entries
A Agents Coding low

Continue

Continue.dev

Open-source AI code assistant for VS Code and JetBrains. Customisable, BYO model, built for enterprise.

M MCP Dev low

ShipItAndPray/mcp-turboquant

Various

MCP server for LLM quantization. Compress any model to GGUF/GPTQ/AWQ in one tool call. First MCP server for model compression.

★ 3 updated 2mo ago
O OSS Orchestration medium

Anything LLM

Community

The all-in-one AI productivity accelerator. On device and privacy first with no annoying setup or configuration.

★ 60,905 updated 2d ago
O OSS Obs medium

Continue

Community

⏩ Source-controlled AI checks, enforceable in CI. Powered by the open-source Continue CLI

★ 33,482 updated 2d ago
O OSS Framework medium

deploy-llms-with-ansible

Community

Easily deploy LLMs with Ansible. Uses Docker with llama.cpp or ollama. Secured with whitelisted IPs.

★ 3 updated 1y ago
O OSS Framework medium

LangChain

Community

The agent engineering platform.

★ 138,234 updated 2d ago
O OSS Framework medium

lighteval

Community

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

★ 2,430 updated 5d ago
O OSS Orchestration medium

LLama Cpp Agent

Community

The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM models, execute structured function calls a

★ 639 updated 2mo ago
O OSS Obs medium

LLMKube

Community

Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscaling, air-gapped, production-ready

★ 118 updated 2d ago
O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 23d ago
O OSS Obs medium

Off Grid

Community

The Swiss Army Knife of Offline AI. Chat, Speak, and Generate Images - Privacy First, Zero Internet. Download an LLM and use it on your mobile device. No data ever leaves your phon

★ 2,335 updated 5d ago
O OSS Framework medium

ollama

Community

Get up and running with Kimi-K2.5, GLM-5, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.

★ 172,846 updated 2d ago
O OSS Orchestration medium

OpenDAN

Community

OpenDAN is an open source Personal AI OS , which consolidates various AI modules in one place for your personal use.

★ 2,032 updated 2mo ago
O OSS Framework medium

Outlines

Community

Structured Outputs

★ 13,914 updated 16d ago
O OSS Orchestration medium

Phidata

Community

Build, run, and manage agent platforms.

★ 40,451 updated 2d ago
O OSS Framework medium

prima.cpp

Community

A distributed implementation of llama.cpp that lets you run 70B-level LLMs on your everyday devices.

O OSS Orchestration medium

Private GPT

Community

Interact with your documents using the power of GPT, 100% privately, no data leaks

★ 57,218 updated 3mo ago
O OSS Framework medium

Qwen2-Math-1.5B|7B|72B

Community

GITHUB HUGGING FACE MODELSCOPE DISCORD 🚨 This model mainly supports English. We will release bilingual (English and Chinese) math models soon. Introduction Over the past year, w

O OSS Framework medium

Serge

Community

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.

★ 5,725 updated 6mo ago
O OSS Framework medium

Wllama

Community

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

★ 1,095 updated 2d ago
P Apps Productivity low

gpt4all

Various

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

★ 77,348 updated 1y ago
P Apps Productivity low

Jan

Various

Jan is an open-source alternative to ChatGPT. Run open-source AI models locally or connect to cloud models like GPT, Claude and others.

P Apps Productivity low

LibreChat

Various

LibreChat brings together all your AI conversations in one unified, customizable interface.

P Apps Productivity low

LLM

Various

LLM: A CLI utility and Python library for interacting with Large Language Models

P Apps Productivity low

Local Deep Research

Various

~95% on SimpleQA (e.g. Qwen3.6-27B on a 3090). Supports all local and cloud LLMs (llama.cpp, Ollama, Google, ...). 10+ search engines - arXiv, PubMed, your private documents. Every

★ 8,273 updated 2d ago
P Apps Productivity low

LM Studio

Various

Run local AI models like gpt-oss, Llama, Gemma, Qwen, and DeepSeek privately on your computer.

P Apps Productivity low

privateGPT

Various

Interact with your documents using the power of GPT, 100% privately, no data leaks

★ 57,218 updated 3mo ago
P Apps Productivity low

PyGPT

Various

PyGPT is an open‑source desktop AI assistant for Windows, macOS and Linux. Chat, agents, web search, run Python, TTS/STT, plugins, long‑term memory.

P Apps Productivity low

Vicuna-13B

Various

We introduce Vicuna-13B, an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. Preliminary evaluation using GPT-4 as a judge s

Pairs with51entries
M MCP Dev low

Jwrede/llmprobe

Various

Synthetic monitoring and CI smoke tests for LLM inference endpoints.

★ 1 updated 18d ago
O OSS Framework medium

awesome-japanese-llm

Community

日本語LLMまとめ - Overview of Japanese LLMs

★ 1,407 updated 4d ago
O OSS Framework medium

Awesome-LLM-Compression

Community

Awesome LLM compression research papers and tools.

★ 1,840 updated 3mo ago
O OSS Framework medium

Awesome-LLM-Inference

Community

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

★ 16 updated 1y ago
O OSS Framework medium

awesome-llm-webapps

Community

A collection of open source, actively maintained web apps for LLM applications

★ 714 updated 11mo ago
O OSS Framework medium

Baichuan-7|13B

Community

AGI Large Language Models

O OSS Obs medium

Chroma

Community

Search infrastructure for AI

★ 28,173 updated 2d ago
O OSS Framework medium

CodeQwen1.5-7B

Community

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction The advent of advanced programming tools, which harnesses the power of large language models (LLMs), has significantly en

O OSS Framework medium

Codestral-7|22B

Community

The most powerful AI platform for enterprises. Customize, fine-tune, and deploy AI assistants, autonomous agents, and multimodal AI with open models.

O OSS Framework medium

DeepSeek-R1

Community

First-generation reasoning models from DeepSeek.

★ 92,010 updated 11mo ago
O OSS Framework medium

femtoGPT

Community

Pure Rust implementation of a minimal Generative Pretrained Transformer

★ 934 updated 7mo ago
O OSS Orchestration medium

Flock

Community

A multi agent desktop application built with Rust and Tauri.

★ 1,073 updated 2d ago
O OSS Obs medium

Gemma

Community

Checking your browser - reCAPTCHA

O OSS Framework medium

Gemma2-9|27B

Community

Gemma 2, our next generation of open models, is now available globally for researchers and developers.

O OSS Framework medium

Grok-1-314B-MoE

Community

Grok-1-314B-MoE — indexed from awesome-llm

O OSS Framework medium

Haystack

Community

Create agentic, context engineered AI systems using Haystack’s modular and customizable building blocks, built for real-world, production-ready applications.

O OSS Framework medium

InternLM2-1.8|7|20B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Orchestration medium

Lagent

Community

A lightweight framework for building LLM-based agents

★ 2,256 updated 5d ago
O OSS Framework medium

Langchain-Chatchat

Community

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like

★ 38,121 updated 6mo ago
O OSS Obs medium

LiteLLM 🚅

Community

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, Vertex

★ 48,950 updated 2d ago
O OSS Framework medium

Llama 1-7|13|33|65B

Community

[OPT-1.3 6.7 13 30 66B](https://arxiv.org/abs/2205.01068)

O OSS Framework medium

Llama 2: Open Foundation and Fine-Tuned Chat Models

Community

2023-07

O OSS Framework medium

Llama 3.2-1|3|11|90B

Community

[Llama 3.1-8 70 405B](https://llama.meta.com/)

O OSS Framework medium

Llama 3-8|70B

Community

[Llama 2-7 13 70B](https://llama.meta.com/llama2/)

O OSS Orchestration medium

LLaMA Cult and More

Community

Large Language Models for All, 🦙 Cult and More, Stay in touch !

★ 449 updated 3y ago
O OSS Framework medium

LLaMA: Open and Efficient Foundation Language Models

Community

2023-02

O OSS Framework medium

MiniCPM-2B

Community

The MiniCPM family of LLMs and VLLMs.

O OSS Framework medium

Mistral 7B

Community

Mistral 7B

O OSS Framework medium

Moonlight-A3B

Community

Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer

O OSS Framework medium

OLMO-eval

Community

Evaluation suite for LLMs

★ 379 updated 10mo ago
O OSS Framework medium

OpenELM-1.1|3B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

Phi1-1.3B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

Qwen-1.8B|7B|14B|72B

Community

Qwen - a Qwen Collection

O OSS Framework medium

Qwen2-0.5B|1.5B|7B|57B-A14B-MoE|72B

Community

GITHUB HUGGING FACE MODELSCOPE DEMO DISCORD Introduction After months of efforts, we are pleased to announce the evolution from Qwen1.5 to Qwen2. This time, we bring to you: Pret

O OSS Framework medium

RWKV: Reinventing RNNs for the Transformer Era

Community

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence le

O OSS Framework medium

Semantic Kernel

Microsoft

Microsoft's enterprise-flavoured framework for AI agents. .NET-first, with Python and Java siblings.

O OSS Framework medium

Shell-Pilot

Community

A simple, lightweight shell script to interact with OpenAI or Ollama or Mistral AI or LocalAI or ZhipuAI from the terminal, and enhancing intelligent system management without any

★ 118 updated 1y ago
O OSS Framework medium

StableLM-3B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

StableLM-v2-12B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

O OSS Framework medium

The Llama 3 Herd of Models

Community

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models

O OSS Framework medium

unslothai

Community

Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.

★ 65,515 updated 2d ago
O OSS SDK one click

Vercel AI SDK

Vercel

The de facto TypeScript SDK for AI apps. Streaming, tools, multi-model, and now an agent loop.

O OSS Framework medium

Yi-34B

Community

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

P Apps Productivity low

Auto-GPT

Various

AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.

★ 184,701 updated 2d ago
P Apps Productivity low

Jan

Various

Jan is an open-source alternative to ChatGPT. Run open-source AI models locally or connect to cloud models like GPT, Claude and others.

P Apps Productivity low

LibreChat

Various

LibreChat brings together all your AI conversations in one unified, customizable interface.

P Apps Productivity low

LLM

Various

LLM: A CLI utility and Python library for interacting with Large Language Models

P Apps Productivity low

LLaMA

Various

Llama LLM, a foundational, 65-billion-parameter large language model by Meta. Meta, February 23rd, 2023. #opensource

P Apps Productivity low

Qwen

Various

Qwickly forging AGI, enhancing intelligence.

P Apps Productivity low

RunThisLLM

Various

Find out exactly what hardware you need to run any local LLM, image, video, or audio AI model. 275+ models with full build specs and performance estimates.

P Apps Productivity low

TurboPilot

Various

Turbopilot is an open source large-language-model based code completion engine that runs locally on CPU

★ 3,789 updated 2y ago
Alternatives10entries
O OSS Framework medium

exllama

Community

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

★ 2,922 updated 2y ago
O OSS Framework medium

femtoGPT

Community

Pure Rust implementation of a minimal Generative Pretrained Transformer

★ 934 updated 7mo ago
O OSS Framework medium

mistral.rs

Community

Fast, flexible LLM inference

★ 7,205 updated 2d ago
O OSS Framework medium

MNN-LLM

Community

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

★ 15,353 updated 2d ago
O OSS Obs medium

Rapid-MLX

Community

The fastest local AI engine for Apple Silicon. 4.2x faster than Ollama, 0.08s cached TTFT, 100% tool calling. 17 tool parsers, prompt cache, reasoning separation, cloud routing. Dr

★ 2,641 updated 2d ago
O OSS Obs medium

Shimmy

Community

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

★ 5,306 updated 2d ago
O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 2d ago
P Apps Productivity low

bitnet.cpp

Various

Official inference framework for 1-bit LLMs

★ 39,132 updated 2mo ago
P Apps Productivity one click

ChatGPT

OpenAI

General-purpose AI assistant for writing, coding, analysis, and conversation. The most widely deployed consumer AI product.

P Apps Productivity low

OpenAI API

Various

Announcement of the OpenAI API for text-to-text general-purpose AI models based on GPT-3. OpenAI blog, June 11, 2020.