Enterprise DNA
O Open Source Frameworks medium

Megatron-LM

by Community

Ongoing research training transformer models at scale

M

OSS

Megatron-LM

Added 1 June 2026

#large-language-models #model-para #transformers

Overview

Megatron-LM is a Python framework for training large transformer models at scale, developed and maintained by NVIDIA. It provides distributed training optimizations and memory-efficient techniques to handle models that exceed single-GPU capacity.

Best for

Best for
ML engineers training large transformer models who need production-grade distributed training infrastructure

Use cases

  • Training billion-parameter language models across multiple GPUs
  • Reducing memory footprint and training time for large transformers
  • Implementing pipeline parallelism and tensor parallelism strategies

Notes

Megatron-LM is a Python framework for training large transformer models at scale, developed and maintained by NVIDIA. It provides distributed training optimizations and memory-efficient techniques to handle models that exceed single-GPU capacity.

16,545 stars on GitHub. Last updated 2026-06-01.

Use cases

  • Training billion-parameter language models across multiple GPUs
  • Reducing memory footprint and training time for large transformers
  • Implementing pipeline parallelism and tensor parallelism strategies

Pros

  • Production-grade distributed training infrastructure from NVIDIA
  • Significant memory and compute optimizations for large models
  • Active research codebase with ongoing improvements

Cons

  • Steep learning curve for distributed training concepts
  • Requires multi-GPU or multi-node setup to be practical
  • Community-driven with less formal support than commercial alternatives

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Production-grade distributed training infrastructure from NVIDIA
  • Significant memory and compute optimizations for large models
  • Active research codebase with ongoing improvements

Cons

  • Steep learning curve for distributed training concepts
  • Requires multi-GPU or multi-node setup to be practical
  • Community-driven with less formal support than commercial alternatives

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Powers5entries
Pairs with10entries
O OSS Framework medium

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

Community

2021-12

O OSS Framework medium

Large Language Model Training in 2023

Community

Learn about large language model training with insights on large language model examples, model architecture, and model training guide.

O OSS Framework medium

ModelEditingPapers

Community

Must-read Papers on Knowledge Editing for Large Language Models.

★ 1,230 updated 10mo ago
O OSS Framework medium

Scaling Instruction-Finetuned Language Models

Community

Flan-T5/PaLM

O OSS Framework medium

Scaling Laws for Neural Language Models

Community

Scaling Law

O OSS Framework medium

Training Compute-Optimal Large Language Models

Community

Chinchilla

O OSS Framework medium

Transformer Engine

Community

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide b

★ 3,374 updated 2d ago
O OSS Framework medium

Unifying Language Learning Paradigms

Community

Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-trai

O OSS Framework medium

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

Community

Microsoft

O OSS Framework medium

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Community

The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LL

Alternatives7entries