Enterprise DNA
O Open Source Frameworks medium

Awesome-LLM-Inference

by Community

๐Ÿ“–A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. ๐ŸŽ‰๐ŸŽ‰

A

OSS

Awesome-LLM-Inference

Added 1 June 2026

Overview

A community-curated GitHub repository that lists papers and code for large language model (LLM) and vision-language model (VLM) inference optimizations. It covers techniques such as WINT8/4 quantization, FlashAttention, PagedAttention, MLA, and parallelism. The repo provides links to the original papers and implementations for each technique.

Best for

Best for
Researchers and engineers seeking a concise overview of recent LLM inference optimization techniques and their implementations.

Use cases

  • Finding reference implementations of inference optimization techniques like FlashAttention or PagedAttention.
  • Exploring quantization methods (e.g., WINT8/4) to reduce model size and speed up inference.
  • Learning about parallelism strategies for deploying LLMs at scale.

Notes

A community-curated GitHub repository that lists papers and code for large language model (LLM) and vision-language model (VLM) inference optimizations. It covers techniques such as WINT8/4 quantization, FlashAttention, PagedAttention, MLA, and parallelism. The repo provides links to the original papers and implementations for each technique.

16 stars on GitHub. Last updated 2025-03-30. Licensed GPL-3.0.

Use cases

  • Finding reference implementations of inference optimization techniques like FlashAttention or PagedAttention.
  • Exploring quantization methods (e.g., WINT8/4) to reduce model size and speed up inference.
  • Learning about parallelism strategies for deploying LLMs at scale.

Pros

  • Curated collection saves time by aggregating relevant papers and code.
  • Covers a broad range of modern inference optimization methods.
  • Provides direct links to resources for quick exploration.

Cons

  • Merely a list, not an executable tool or library.
  • Limited community validation with only 16 stars.
  • May lack detailed tutorials or integration guides.

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Curated collection saves time by aggregating relevant papers and code.
  • Covers a broad range of modern inference optimization methods.
  • Provides direct links to resources for quick exploration.

Cons

  • Merely a list, not an executable tool or library.
  • Limited community validation with only 16 stars.
  • May lack detailed tutorials or integration guides.