DeepSpeed
by Community
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
OSS
DeepSpeed
Added 1 June 2026
Overview
DeepSpeed is a Python library for optimizing distributed training and inference of large language models and deep neural networks. It reduces memory footprint, accelerates training speed, and enables efficient multi-GPU and multi-node setups through techniques like gradient checkpointing, mixed precision, and ZeRO optimizer states partitioning.
Best for
Best for
Teams training large models who need to maximize GPU efficiency and scale across multiple devices.
Use cases
- Training large models on limited GPU memory
- Scaling training across multiple GPUs or nodes
- Reducing inference latency for deployed models
Notes
DeepSpeed is a Python library for optimizing distributed training and inference of large language models and deep neural networks. It reduces memory footprint, accelerates training speed, and enables efficient multi-GPU and multi-node setups through techniques like gradient checkpointing, mixed precision, and ZeRO optimizer states partitioning.
42,436 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.
Use cases
- Training large models on limited GPU memory
- Scaling training across multiple GPUs or nodes
- Reducing inference latency for deployed models
Pros
- Significant memory savings enable training larger models on existing hardware
- Production-ready with strong community adoption and Microsoft backing
- Works with existing PyTorch code with minimal integration effort
Cons
- Steep learning curve for advanced features like ZeRO stages and custom configurations
- Debugging distributed training issues remains complex despite optimizations
- Performance gains vary significantly based on hardware, model architecture, and tuning
Indexed from awesome-llm and enriched against its public facts.
Pros
- Significant memory savings enable training larger models on existing hardware
- Production-ready with strong community adoption and Microsoft backing
- Works with existing PyTorch code with minimal integration effort
Cons
- Steep learning curve for advanced features like ZeRO stages and custom configurations
- Debugging distributed training issues remains complex despite optimizations
- Performance gains vary significantly based on hardware, model architecture, and tuning
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Accelerate
Community
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP a
Axolotl
Community
Go ahead and axolotl questions
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Community
BigScience
FlagAI
Community
FlagAI (Fast LArge-scale General AI models) is a fast, easy-to-use and extensible toolkit for large-scale model.
GPT-NeoX
Community
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
Megatron-DeepSpeed
Community
Ongoing research training transformer language models at scale, including: BERT & GPT-2
MPT-7B
Community
Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available fo
OLMo: Accelerating the Science of Language Models
Community
Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have be
OLMoE: Open Mixture-of-Experts Language Models
Community
We introduce OLMoE, a fully open, state-of-the-art language model leveraging sparse Mixture-of-Experts (MoE). OLMoE-1B-7B has 7 billion (B) parameters but uses only 1B per input
Tune Studio
Community
Playground for devs to finetune & deploy LLMs
Using Deep and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Community
Megatron-Turing NLG
Unsloth
Various
Unsloth is an open-source, no-code web UI for training, running and exporting open models in one unified local interface.
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Community
BigScience
DeepSeek-Math-7B
Community
DeepSeek Math series
DeepSeek-v2-236B-MoE
Community
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of whic
Falcon 40B
Community
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
GLM-130B: An Open Bilingual Pre-trained Model
Community
GLM-130B
Megatron-DeepSpeed
Community
Ongoing research training transformer language models at scale, including: BERT & GPT-2
MPT-7B
Community
Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available fo
veRL
Community
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Community
Microsoft
Bloom
Various
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
Awesome-LLM-Compression
Community
Awesome LLM compression research papers and tools.
Awesome-LLM-Inference
Community
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
Datatrove
Community
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
Finetuned Language Models are Zero-Shot Learners
Community
This paper explores a simple method for improving the zero-shot learning abilities of language models. We show that instruction tuning—finetuning language models on a collection
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
Community
2021-12
GLM-2|6|10|13|70B
Community
Org profile for THUDM on Hugging Face, the AI community building the future.
Large Language Model Training in 2023
Community
Learn about large language model training with insights on large language model examples, model architecture, and model training guide.
ModelEditingPapers
Community
Must-read Papers on Knowledge Editing for Large Language Models.
Scaling Instruction-Finetuned Language Models
Community
Flan-T5/PaLM
Scaling Laws for Neural Language Models
Community
Scaling Law
Training Compute-Optimal Large Language Models
Community
Chinchilla
Transformer Engine
Community
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide b
Unifying Language Learning Paradigms
Community
Existing pre-trained models are generally geared towards a particular class of problems. To date, there seems to be still no consensus on what the right architecture and pre-trai
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Community
The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. However, the pretraining datasets for state-of-the-art open LL
BMTrain
Community
Efficient Training (including pre-training and fine-tuning) for Big Models
Colossal-AI
Community
Making large AI models cheaper, faster and more accessible
FasterTransformer
Community
Transformer related optimization, including BERT, GPT
Megatron-LM
Community
Ongoing research training transformer models at scale
maxtext
Community
A simple, performant and scalable Jax LLM!
nanotron
Community
Minimalistic large language model 3D-parallelism training
torchtitan
Community
A PyTorch native platform for training generative AI models
unslothai
Community
Unsloth Studio is a web UI for training and running open models like Gemma 4, Qwen3.6, DeepSeek, gpt-oss locally.
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Community
Megatron-LM