GPT-NeoX
by Community
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
OSS
GPT-NeoX
Added 1 June 2026
Overview
GPT-NeoX is a framework for training large-scale autoregressive transformer models. It implements model parallelism across GPUs using Megatron and DeepSpeed libraries. Built by EleutherAI, it is designed for researchers to train GPT-like models at scale.
Best for
Best for
Researchers and engineers training custom large language models
Use cases
- Training large language models from scratch
- Experimenting with model parallelism techniques
- Fine-tuning autoregressive transformers on custom datasets
Notes
GPT-NeoX is a framework for training large-scale autoregressive transformer models. It implements model parallelism across GPUs using Megatron and DeepSpeed libraries. Built by EleutherAI, it is designed for researchers to train GPT-like models at scale.
7,432 stars on GitHub. Last updated 2026-05-19. Licensed Apache-2.0.
Use cases
- Training large language models from scratch
- Experimenting with model parallelism techniques
- Fine-tuning autoregressive transformers on custom datasets
Pros
- Enables training of very large models (tens of billions of parameters)
- Leverages proven Megatron and DeepSpeed optimizations
- Open source with strong community support (over 7,000 stars)
Cons
- Requires substantial GPU compute infrastructure
- Primarily suited for autoregressive models only
- Less polished than commercial offerings; may require deep engineering expertise
Indexed from awesome-llm and enriched against its public facts.
Pros
- Enables training of very large models (tens of billions of parameters)
- Leverages proven Megatron and DeepSpeed optimizations
- Open source with strong community support (over 7,000 stars)
Cons
- Requires substantial GPU compute infrastructure
- Primarily suited for autoregressive models only
- Less polished than commercial offerings; may require deep engineering expertise
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
DeepSpeed
Community
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Megatron-LM
Community
Ongoing research training transformer models at scale