O Open Source Frameworks medium

Awesome-LLM-Inference

by Community

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

A community-curated GitHub repository that lists papers and code for large language model (LLM) and vision-language model (VLM) inference optimizations. It covers techniques such as WINT8/4 quantization, FlashAttention, PagedAttention, MLA, and parallelism. The repo provides links to the original papers and implementations for each technique.

Best for

Best for
Researchers and engineers seeking a concise overview of recent LLM inference optimization techniques and their implementations.

Use cases

Finding reference implementations of inference optimization techniques like FlashAttention or PagedAttention.
Exploring quantization methods (e.g., WINT8/4) to reduce model size and speed up inference.
Learning about parallelism strategies for deploying LLMs at scale.

Notes

16 stars on GitHub. Last updated 2025-03-30. Licensed GPL-3.0.

Use cases

Finding reference implementations of inference optimization techniques like FlashAttention or PagedAttention.
Exploring quantization methods (e.g., WINT8/4) to reduce model size and speed up inference.
Learning about parallelism strategies for deploying LLMs at scale.

Pros

Curated collection saves time by aggregating relevant papers and code.
Covers a broad range of modern inference optimization methods.
Provides direct links to resources for quick exploration.

Cons

Merely a list, not an executable tool or library.
Limited community validation with only 16 stars.
May lack detailed tutorials or integration guides.

Indexed from awesome-llm and enriched against its public facts.

Pros

Curated collection saves time by aggregating relevant papers and code.
Covers a broad range of modern inference optimization methods.
Provides direct links to resources for quick exploration.

Cons

Merely a list, not an executable tool or library.
Limited community validation with only 16 stars.
May lack detailed tutorials or integration guides.

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with3entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

O OSS Framework medium

llama.cpp

Community

LLM inference in C/C++

★ 114,160 updated 1mo ago

O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →