Enterprise DNA
O Open Source Frameworks medium

FasterTransformer

by Community

Transformer related optimization, including BERT, GPT

F

OSS

FasterTransformer

Added 1 June 2026

#bert #gpt #pytorch #transformer

Overview

FasterTransformer is an open-source framework that accelerates transformer model inference. It implements optimized kernels and memory management for models like BERT and GPT. Written in C++, it provides high-performance execution on NVIDIA GPUs.

Best for

Best for
Developers seeking maximum inference performance for transformer models on NVIDIA hardware

Use cases

  • Deploying large BERT models for low-latency inference
  • Running GPT-based text generation with higher throughput
  • Optimizing transformer inference on NVIDIA GPUs

Notes

FasterTransformer is an open-source framework that accelerates transformer model inference. It implements optimized kernels and memory management for models like BERT and GPT. Written in C++, it provides high-performance execution on NVIDIA GPUs.

6,418 stars on GitHub. Last updated 2024-03-27. Licensed Apache-2.0.

Use cases

  • Deploying large BERT models for low-latency inference
  • Running GPT-based text generation with higher throughput
  • Optimizing transformer inference on NVIDIA GPUs

Pros

  • Delivers state-of-the-art inference speed for supported transformers
  • Actively maintained with strong community adoption (6,418 stars)
  • Fine-tuned for NVIDIA GPU architectures

Cons

  • Limited to NVIDIA GPUs, no CPU or other hardware support
  • C++ codebase requires compilation and integration effort
  • Does not offer a high-level API; manual configuration needed

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Delivers state-of-the-art inference speed for supported transformers
  • Actively maintained with strong community adoption (6,418 stars)
  • Fine-tuned for NVIDIA GPU architectures

Cons

  • Limited to NVIDIA GPUs, no CPU or other hardware support
  • C++ codebase requires compilation and integration effort
  • Does not offer a high-level API; manual configuration needed