Enterprise DNA
O Open Source Frameworks medium

OpenRLHF

by Community

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)

O

OSS

OpenRLHF

Added 1 June 2026

#large-language-models #proximal-policy-optimization #raylib #reinforcement-learning #reinforcement-learning-from-human-feedback #transformers #visual-language-models #vllm

Overview

OpenRLHF is an open-source framework for agentic reinforcement learning with language and vision-language models. It is built on Ray for distributed scaling and supports multiple RL algorithms including PPO, DAPO, and REINFORCE++. The framework integrates with vLLM for efficient inference and enables asynchronous RL training.

Best for

Best for
Developers building large-scale RL training systems for language and vision-language models

Use cases

  • Training LLMs with reinforcement learning from human feedback (RLHF) at scale
  • Implementing agentic RL workflows that require distributed compute and async execution
  • Experimenting with policy gradient methods like PPO or REINFORCE++ on multimodal models

Notes

OpenRLHF is an open-source framework for agentic reinforcement learning with language and vision-language models. It is built on Ray for distributed scaling and supports multiple RL algorithms including PPO, DAPO, and REINFORCE++. The framework integrates with vLLM for efficient inference and enables asynchronous RL training.

9,583 stars on GitHub. Last updated 2026-05-28. Licensed Apache-2.0.

Use cases

  • Training LLMs with reinforcement learning from human feedback (RLHF) at scale
  • Implementing agentic RL workflows that require distributed compute and async execution
  • Experimenting with policy gradient methods like PPO or REINFORCE++ on multimodal models

Pros

  • Uses Ray for seamless distributed computing across clusters
  • Supports a broad range of modern RL algorithms out of the box
  • Integrates with vLLM for fast LLM inference during training

Cons

  • Requires familiarity with Ray and distributed system concepts
  • Community-maintained, so support and documentation are limited
  • Steep learning curve for developers new to RL frameworks

Indexed from awesome-llm and enriched against its public facts.

Pros

  • Uses Ray for seamless distributed computing across clusters
  • Supports a broad range of modern RL algorithms out of the box
  • Integrates with vLLM for fast LLM inference during training

Cons

  • Requires familiarity with Ray and distributed system concepts
  • Community-maintained, so support and documentation are limited
  • Steep learning curve for developers new to RL frameworks

Pairs with

Other entries in the index that connect to this one. Click through to see the chain.

Pairs with5entries
Alternatives5entries