Best for: Developers building inference systems for edge devices, mobile applications, or cost-sensitive deployments where model size and speed outweigh maximum accuracy.

O OSS Framework medium

MNN-LLM

Community

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

★ 15,353 updated 1mo ago

open-source

Best for: Developers building production on-device LLM and edge AI applications where latency and resource efficiency are critical.

O OSS Framework medium

mistral.rs

Community

Fast, flexible LLM inference

★ 7,205 updated 1mo ago

open-source

Best for: Rust developers seeking a fast, flexible LLM inference framework for performance-critical or resource-constrained environments.

O OSS Obs medium

CTranslate2

Community

Fast inference engine for Transformer models

★ 4,507 updated 1mo ago

open-source

Best for: Developers deploying Transformer models in production who need maximum inference speed on CPU or limited hardware

O OSS Framework medium

exllama

Community

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.

★ 2,922 updated 2y ago

open-source

Best for: Developers running quantized Llama models on resource-constrained hardware

P Apps Productivity low

OpenAI API

Various

Announcement of the OpenAI API for text-to-text general-purpose AI models based on GPT-3. OpenAI blog, June 11, 2020.

freemium

Best for: Developers needing quick integration of general text generation into their applications