Megatron-DeepSpeed
by Community
Ongoing research training transformer language models at scale, including: BERT & GPT-2
OSS
Megatron-DeepSpeed
Added 1 June 2026
Overview
Open-source framework for training large transformer models like BERT and GPT-2 at scale. Combines model parallelism and ZeRO optimizations to handle distributed training across multiple GPUs. Primarily used for ongoing research on scaling transformer language models.
Best for
Best for
Researchers and engineers training large-scale transformer models in distributed environments
Use cases
- Training large transformer language models from scratch
- Distributed training across multiple GPU nodes
- Research into scaling behaviors and model parallelism
Notes
Open-source framework for training large transformer models like BERT and GPT-2 at scale. Combines model parallelism and ZeRO optimizations to handle distributed training across multiple GPUs. Primarily used for ongoing research on scaling transformer language models.
2,252 stars on GitHub. Last updated 2025-08-14.
Use cases
- Training large transformer language models from scratch
- Distributed training across multiple GPU nodes
- Research into scaling behaviors and model parallelism
Pros
- Efficient model parallelism and ZeRO integration for large-scale training
- Proven in research environments for models like BERT and GPT-2
- Active community with ongoing development
Cons
- Complex setup and configuration compared to simpler frameworks
- Requires substantial hardware resources and expertise
- Documentation can be sparse or research-oriented
Indexed from awesome-llm and enriched against its public facts.
Pros
- Efficient model parallelism and ZeRO integration for large-scale training
- Proven in research environments for models like BERT and GPT-2
- Active community with ongoing development
Cons
- Complex setup and configuration compared to simpler frameworks
- Requires substantial hardware resources and expertise
- Documentation can be sparse or research-oriented
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
PyTorch
Community
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed
Community
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Megatron-LM
Community
Ongoing research training transformer models at scale
DeepSpeed
Community
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Colossal-AI
Community
Making large AI models cheaper, faster and more accessible
NeMo Framework
Community
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech