PRIMO R1
Collection
Official release of PRIMO R1, a 7B video MLLM for robotic process reasoning featuring RL-optimized models, SFT/RL datasets, and cross-domain benchmark • 7 items • Updated • 4
This model is part of the PRIMO series and is trained for video-based reasoning in robotic manipulation settings. This is an ablation model, which is compared to our RRIMP R1 in the paper。
Current video MLLMs often function as passive "Observers" that recognize ongoing events rather than evaluating the current state relative to the final task goal. PRIMO R1 transforms these models into active "Critics" by:
If you find our work helpful for your research, please consider citing our work.
@misc{liu2026passiveobserveractivecritic,
title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation},
author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu},
year={2026},
eprint={2603.15600},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.15600},
}