veRL
by Community
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
OSS
veRL
Added 1 June 2026
Overview
veRL is a Python framework for reinforcement learning post-training of large language models. It provides a flexible architecture for running RL workflows at scale, supporting distributed training across multiple GPUs and optimized inference pipelines. The framework handles reward modeling, policy optimization, and generation sampling in a modular design.
Best for
Best for
ML engineers building custom RL post-training pipelines for LLMs at scale
Use cases
- Fine-tuning LLMs with RL objectives like RLHF or DPO
- Running distributed RL experiments across GPU clusters
- Building custom reward models and policy optimization loops
Notes
veRL is a Python framework for reinforcement learning post-training of large language models. It provides a flexible architecture for running RL workflows at scale, supporting distributed training across multiple GPUs and optimized inference pipelines. The framework handles reward modeling, policy optimization, and generation sampling in a modular design.
21,691 stars on GitHub. Last updated 2026-06-01. Licensed Apache-2.0.
Use cases
- Fine-tuning LLMs with RL objectives like RLHF or DPO
- Running distributed RL experiments across GPU clusters
- Building custom reward models and policy optimization loops
Pros
- Modular architecture allows swapping components like reward models and optimizers
- Optimized for distributed training with efficient GPU utilization
- Active community project with 21k+ stars indicating adoption and maintenance
Cons
- Requires significant infrastructure and GPU resources to run effectively
- Steeper learning curve compared to higher-level fine-tuning APIs
- Documentation and examples may be limited relative to mainstream frameworks
Indexed from awesome-llm and enriched against its public facts.
Pros
- Modular architecture allows swapping components like reward models and optimizers
- Optimized for distributed training with efficient GPU utilization
- Active community project with 21k+ stars indicating adoption and maintenance
Cons
- Requires significant infrastructure and GPU resources to run effectively
- Steeper learning curve compared to higher-level fine-tuning APIs
- Documentation and examples may be limited relative to mainstream frameworks
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.
PyTorch
Community
Tensors and Dynamic neural networks in Python with strong GPU acceleration
DeepSpeed
Community
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
vLLM
Community
A high-throughput and memory-efficient inference and serving engine for LLMs
FastChat
Community
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Awesome LLM Human Preference Datasets
Community
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
Improving alignment of dialogue agents via targeted human judgements
Community
DeepMind
Training language models to follow instructions with human feedback
Community
InstructGPT
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Community
General reasoning represents a long-standing and formidable challenge in artificial intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-t
OpenRLHF
Community
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & VLM & TIS & vLLM & Ray & Async RL)
ROLL
Community
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
TRL
Community
We’re on a journey to advance and democratize artificial intelligence through open source and open science.