O Open Source Frameworks medium

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

by Community

How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce \textit{

Visit Community View repo Submit your build →

OSS

Added 2 June 2026

Overview

Pythia is a suite of 16 large language models trained on identical public data ordering, ranging from 70M to 12B parameters. It provides 154 checkpoints per model and tools to reconstruct training dataloaders, enabling analysis of training dynamics and scaling effects.

Best for

Best for
Researchers studying LLM training dynamics and scaling laws

Use cases

Studying model development over training steps
Comparing behavior across model scales
Reproducing and extending training analyses

Notes

Use cases

Studying model development over training steps
Comparing behavior across model scales
Reproducing and extending training analyses

Pros

Publicly released checkpoints for many model sizes
Exact training data order for controlled comparisons
Tools to reconstruct dataloaders for further study

Cons

Limited to models up to 12B parameters
Requires significant storage to download all checkpoints
Focused on research rather than deployment

Indexed from awesome-llm and enriched against its public facts.

Pros

Publicly released checkpoints for many model sizes
Exact training data order for controlled comparisons
Tools to reconstruct dataloaders for further study

Cons

Limited to models up to 12B parameters
Requires significant storage to download all checkpoints
Focused on research rather than deployment

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Built with1entry

O OSS Obs medium

PyTorch

Community

Tensors and Dynamic neural networks in Python with strong GPU acceleration

★ 100,318 updated 23d ago

Pairs with1entry

O OSS Framework medium

lm-evaluation-harness

Community

A framework for few-shot evaluation of language models.

★ 12,772 updated 1mo ago

← Back to Open Source Submit your own entry →