Video-Text-to-Text
Safetensors
qwen2_5_vl
robotic-manipulation
reinforcement-learning
chain-of-thought

PRIMO COT SFT 7B

This model is part of the PRIMO series and is trained for video-based reasoning in robotic manipulation settings. This is an ablation model, which is compared to our RRIMP R1 in the paper。

Model Description

Current video MLLMs often function as passive "Observers" that recognize ongoing events rather than evaluating the current state relative to the final task goal. PRIMO R1 transforms these models into active "Critics" by:

  • Reinforcement Learning: Leveraging outcome-based RL to incentivize explicit Chain-of-Thought (CoT) generation for progress estimation.
  • Temporal Anchoring: Constructing a structured temporal input that explicitly anchors the video sequence between initial and current state images.
  • Process Reasoning: Focusing on evaluating the current state against the intended task goal to detect failures and track progress.

Citations

If you find our work helpful for your research, please consider citing our work.

@misc{liu2026passiveobserveractivecritic,
      title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation}, 
      author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu},
      year={2026},
      eprint={2603.15600},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2603.15600}, 
}
Downloads last month
32
Safetensors
Model size
849k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LeonOverload/PRIMO-COT-SFT-7B

Finetuned
(1048)
this model
Quantizations
1 model

Datasets used to train LeonOverload/PRIMO-COT-SFT-7B

Collection including LeonOverload/PRIMO-COT-SFT-7B

Paper for LeonOverload/PRIMO-COT-SFT-7B