O Open Source Frameworks medium

Training language models to follow instructions with human feedback

by Community

InstructGPT

Visit Community View repo Submit your build →

OSS

Added 1 June 2026

Overview

InstructGPT is a method for fine-tuning language models using human feedback. It collects human-written demonstrations and comparisons to train a reward model, then uses reinforcement learning to optimize the language model to produce outputs preferred by humans. This approach significantly improves instruction-following and reduces harmful or untruthful responses compared to the base model.

Best for

Best for
Researchers and engineers aligning large language models to human preferences for safety and instruction-following

Use cases

Fine-tuning an existing large language model to better follow user instructions
Reducing toxic or biased outputs from a generative language model
Aligning a model's behavior with human preferences for safe deployment

Notes

Use cases

Fine-tuning an existing large language model to better follow user instructions
Reducing toxic or biased outputs from a generative language model
Aligning a model’s behavior with human preferences for safe deployment

Pros

Demonstrates significant improvement in following instructions over base GPT-3
Reduces the frequency of harmful and untruthful outputs
Provides a reproducible framework for aligning language models

Cons

Requires substantial human annotation effort for demonstrations and comparisons
The RLHF process can be computationally expensive and unstable
May still produce errors or biased responses despite alignment

Indexed from awesome-llm and enriched against its public facts.

Pros

Demonstrates significant improvement in following instructions over base GPT-3
Reduces the frequency of harmful and untruthful outputs
Provides a reproducible framework for aligning language models

Cons

Requires substantial human annotation effort for demonstrations and comparisons
The RLHF process can be computationally expensive and unstable
May still produce errors or biased responses despite alignment

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with2entries

O OSS Framework medium

OpenRLHF

Community

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

★ 9,583 updated 27d ago

O OSS Framework medium

veRL

Community

verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework

★ 21,691 updated 23d ago

← Back to Open Source Submit your own entry →