O Open Source Frameworks medium

Transformer Engine

by Community

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide b

Visit Community View repo Submit your build →

OSS

Transformer Engine

Added 1 June 2026

#cuda #deep-learning #fp4 #fp8 #gpu #jax #machine-learning #python

Overview

Transformer Engine is a Python library that accelerates Transformer models on NVIDIA GPUs by leveraging low-precision floating point formats (FP8 and FP4). It targets Hopper, Ada, and Blackwell architectures to improve performance and reduce memory usage during both training and inference.

Best for

Best for
Developers training or deploying large transformer models on modern NVIDIA GPUs who need to maximize performance and minimize memory usage

Use cases

Training large language models with reduced memory footprint
Running inference on transformer models with higher throughput
Fine-tuning transformers on GPU clusters with limited VRAM

Notes

3,374 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.

Use cases

Training large language models with reduced memory footprint
Running inference on transformer models with higher throughput
Fine-tuning transformers on GPU clusters with limited VRAM

Pros

Significantly reduces memory consumption compared to FP32 or FP16
Optimized for the latest NVIDIA GPU families (Hopper, Ada, Blackwell)
Supports both training and inference for transformer architectures

Cons

Requires compatible NVIDIA GPUs (Hopper, Ada, or Blackwell) to use FP8/FP4
Limited to specific precision formats; not a general-purpose optimization library
May need code modifications to integrate into existing PyTorch workflows

Indexed from awesome-llm and enriched against its public facts.

Pros

Significantly reduces memory consumption compared to FP32 or FP16
Optimized for the latest NVIDIA GPU families (Hopper, Ada, Blackwell)
Supports both training and inference for transformer architectures

Cons

Requires compatible NVIDIA GPUs (Hopper, Ada, or Blackwell) to use FP8/FP4
Limited to specific precision formats; not a general-purpose optimization library
May need code modifications to integrate into existing PyTorch workflows

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Uses2entries

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 1mo ago

O OSS Obs medium

TensorFlow

Community

An Open Source Machine Learning Framework for Everyone

★ 195,356 updated 1mo ago

Pairs with5entries

O OSS Framework medium

vLLM

Community

A high-throughput and memory-efficient inference and serving engine for LLMs

★ 81,619 updated 1mo ago

O OSS Framework medium

Litgpt

Community

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

★ 13,395 updated 1mo ago

O OSS Framework medium

TensorRT-LLM

Community

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NV

★ 13,781 updated 1mo ago

O OSS Framework medium

SGLang

Community

SGLang is a high-performance serving framework for large language models and multimodal models.

★ 28,885 updated 1mo ago

O OSS Framework medium

DeepSpeed

Community

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

★ 42,436 updated 1mo ago

Alternative to3entries

O OSS Framework medium

DeepSpeed

Community

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

★ 42,436 updated 1mo ago

O OSS Framework medium

Megatron-LM

Community

Ongoing research training transformer models at scale

★ 16,545 updated 1mo ago

O OSS Framework medium

Colossal-AI

Community

Making large AI models cheaper, faster and more accessible

★ 41,382 updated 1mo ago

Free 27-page guide

Get the free Developer’s Field Guide

A 27-page field guide to the AI coding workflow with Claude. Claude Code, MCP servers, the prompt patterns that work, and what to delegate. Free.

Enter your work email. We send it straight over, plus a few short notes worth knowing. Unsubscribe any time.

← Back to Open Source Submit your own entry →