OpenVLA
by Community
OpenVLA: An open-source vision-language-action model for robotic manipulation.
OSS
OpenVLA
Added 1 June 2026
Overview
OpenVLA is an open-source vision-language-action model that enables robots to perform manipulation tasks by interpreting visual inputs and natural language commands. It combines a vision encoder, a language model, and an action decoder to output control signals. The model is designed to be fine-tuned for specific robots and environments.
Best for
Best for
Robotics researchers and developers building custom vision-language-action policies for manipulation
Use cases
- Controlling robotic arms with natural language instructions
- Fine-tuning the model for custom manipulation tasks or datasets
- Research into generalist robot policies and imitation learning
Notes
OpenVLA is an open-source vision-language-action model that enables robots to perform manipulation tasks by interpreting visual inputs and natural language commands. It combines a vision encoder, a language model, and an action decoder to output control signals. The model is designed to be fine-tuned for specific robots and environments.
6,322 stars on GitHub. Last updated 2025-03-23. Licensed MIT.
Use cases
- Controlling robotic arms with natural language instructions
- Fine-tuning the model for custom manipulation tasks or datasets
- Research into generalist robot policies and imitation learning
Pros
- Open-source and community-driven, reducing vendor lock-in
- Supports fine-tuning for task-specific adaptation
- Large and growing ecosystem (6.3k+ GitHub stars)
Cons
- Requires significant GPU memory and compute for inference and training
- Model performance depends heavily on training data quality and task similarity
- Not yet production-tested for safety-critical or high-reliability deployments
Indexed from awesome-llmops and enriched against its public facts.
Pros
- Open-source and community-driven, reducing vendor lock-in
- Supports fine-tuning for task-specific adaptation
- Large and growing ecosystem (6.3k+ GitHub stars)
Cons
- Requires significant GPU memory and compute for inference and training
- Model performance depends heavily on training data quality and task similarity
- Not yet production-tested for safety-critical or high-reliability deployments
Pairs with
Other entries in the index that connect to this one. Click through to see the chain.