O Open Source Frameworks medium

llama.cpp

by Community

LLM inference in C/C++

Visit Community View repo Submit your build →

OSS

llama.cpp

Added 1 June 2026

#ggml

Overview

llama.cpp is a C++ inference framework that runs large language models locally on consumer hardware. It provides optimized tensor operations and quantization support to reduce model size and memory footprint, enabling fast inference without cloud dependencies.

Best for

Best for
Developers building privacy-first or offline-capable applications with constrained hardware

Use cases

Running open-source LLMs on laptops or edge devices
Building offline AI applications with minimal latency
Quantizing and deploying models with reduced VRAM requirements

Notes

114,160 stars on GitHub. Last updated 2026-06-01. Licensed MIT.

Use cases

Running open-source LLMs on laptops or edge devices
Building offline AI applications with minimal latency
Quantizing and deploying models with reduced VRAM requirements

Pros

Minimal dependencies and fast startup, runs on CPU and GPU
Extensive quantization options (4-bit, 8-bit) dramatically reduce model size
Active community with broad hardware support including Apple Silicon

Cons

Requires manual model conversion and quantization workflows
Performance varies significantly by hardware, CPU inference is slower than GPU alternatives
Limited built-in abstractions for complex multi-model pipelines

Indexed from awesome-llm and enriched against its public facts.

Pros

Minimal dependencies and fast startup, runs on CPU and GPU
Extensive quantization options (4-bit, 8-bit) dramatically reduce model size
Active community with broad hardware support including Apple Silicon

Cons

Requires manual model conversion and quantization workflows
Performance varies significantly by hardware, CPU inference is slower than GPU alternatives
Limited built-in abstractions for complex multi-model pipelines

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with3entries

llama.cpp

Overview

Best for

Use cases

Notes

Use cases

Pros

Cons

Pairs with

ollama

Open WebUI

gpt4all

gpt-migrate

memfree

openinterpreter

bgauryy/octocode-mcp

dcostenco/prism-mcp

ShipItAndPray/mcp-turboquant

Anything LLM

Codestral-7|22B

deploy-llms-with-ansible

FastChat

fauxpilot

LiteChain

LLama Cpp Agent

LLMKube

Local GPT

Off Grid

OpenLLM

Pipecat

Private GPT

QA-Pilot

Serge

Build a Reasoning Model (From Scratch)

gpt4all

Jan

Jenni

LangChain

LM Studio

Local Deep Research

privateGPT

PyGPT

RunThisLLM

Unsloth

MikkoParkkola/nab

srclight/srclight

ollama

prima.cpp

TreeScale

Wllama

gpt4all

AilingBot

AutoGen

Awesome GPT

awesome-japanese-llm

Awesome-LLM-Inference

Baichuan-7|13B

Build a Large Language Model (From Scratch)

ChatAbstractions

DeepSeek-VL-1.3|7B

Future AGI

Gemma

Gemma2-9|27B

Google "We Have No Moat, And Neither Does OpenAI"

Guidance

InternLM2-1.8|7|20B

Lancedb

Langchain-Chatchat

LiteLLM 🚅

Llama 3.2-1|3|11|90B

Llama 3-8|70B

LLaMA Cult and More

LlamaIndex

llm-ui

MiniCPM-2B

Mixtral-8x7B

Moonlight-A3B

MPT-7B

OLMo-7B

OneComp