OpenRLHF
by Community
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
OSS
OpenRLHF
Added 1 June 2026
Overview
OpenRLHF is an open-source framework for agentic reinforcement learning with language and vision-language models. It is built on Ray for distributed scaling and supports multiple RL algorithms including PPO, DAPO, and REINFORCE++. The framework integrates with vLLM for efficient inference and enables asynchronous RL training.
Best for
Best for
Developers building large-scale RL training systems for language and vision-language models
Use cases
- Training LLMs with reinforcement learning from human feedback (RLHF) at scale
- Implementing agentic RL workflows that require distributed compute and async execution
- Experimenting with policy gradient methods like PPO or REINFORCE++ on multimodal models
Notes
OpenRLHF is an open-source framework for agentic reinforcement learning with language and vision-language models. It is built on Ray for distributed scaling and supports multiple RL algorithms including PPO, DAPO, and REINFORCE++. The framework integrates with vLLM for efficient inference and enables asynchronous RL training.
9,583 stars on GitHub. Last updated 2026-05-28. Licensed Apache-2.0.
Use cases
- Training LLMs with reinforcement learning from human feedback (RLHF) at scale
- Implementing agentic RL workflows that require distributed compute and async execution
- Experimenting with policy gradient methods like PPO or REINFORCE++ on multimodal models
Pros
- Uses Ray for seamless distributed computing across clusters
- Supports a broad range of modern RL algorithms out of the box
- Integrates with vLLM for fast LLM inference during training
Cons
- Requires familiarity with Ray and distributed system concepts
- Community-maintained, so support and documentation are limited
- Steep learning curve for developers new to RL frameworks
Indexed from awesome-llm and enriched against its public facts.
Pros
- Uses Ray for seamless distributed computing across clusters
- Supports a broad range of modern RL algorithms out of the box
- Integrates with vLLM for fast LLM inference during training
Cons
- Requires familiarity with Ray and distributed system concepts
- Community-maintained, so support and documentation are limited
- Steep learning curve for developers new to RL frameworks
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
Awesome LLM Human Preference Datasets
Community
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
FastDatasets
Community
A powerful tool for creating high-quality training datasets for Large Language Models (LLMs)(一个快速生成高质量LLM微调训练数据集的工具)
Improving alignment of dialogue agents via targeted human judgements
Community
DeepMind
Training language models to follow instructions with human feedback
Community
InstructGPT
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Community
General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-t
ROLL
Community
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
TRL
Community
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
veRL
Community
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Community
Stanford
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Community
Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to al