Enterprise DNA
O Open Source Frameworks medium

GPT-NeoX

by Community

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

G

OSS

GPT-NeoX

Added 1 June 2026

#deepspeed-library #gpt-3 #language-model #transformers

Overview

GPT-NeoX is a framework for training large-scale autoregressive transformer models. It implements model parallelism across GPUs using Megatron and DeepSpeed libraries. Built by EleutherAI, it is designed for researchers to train GPT-like models at scale.

Best for

Best for
Researchers and engineers training custom large language models

Use cases

  • Training large language models from scratch
  • Experimenting with model parallelism techniques
  • Fine-tuning autoregressive transformers on custom datasets

Notes

GPT-NeoX is a framework for training large-scale autoregressive transformer models. It implements model parallelism across GPUs using Megatron and DeepSpeed libraries. Built by EleutherAI, it is designed for researchers to train GPT-like models at scale.

7,432 stars on GitHub. Last updated 2026-05-19. Licensed Apache-2.0.

Use cases

  • Training large language models from scratch
  • Experimenting with model parallelism techniques
  • Fine-tuning autoregressive transformers on custom datasets

Pros

  • Enables training of very large models (tens of billions of parameters)
  • Leverages proven Megatron and DeepSpeed optimizations
  • Open source with strong community support (over 7,000 stars)

Cons

  • Requires substantial GPU compute infrastructure
  • Primarily suited for autoregressive models only
  • Less polished than commercial offerings; may require deep engineering expertise

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Enables training of very large models (tens of billions of parameters)
  • Leverages proven Megatron and DeepSpeed optimizations
  • Open source with strong community support (over 7,000 stars)

Cons

  • Requires substantial GPU compute infrastructure
  • Primarily suited for autoregressive models only
  • Less polished than commercial offerings; may require deep engineering expertise