Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks
Paper
β’
2511.15065
β’
Published
β’
75
Fine-tuned on VR-Bench to evaluate and enhance video-based reasoning ability across structured maze environments.
| Model | Download | Description |
|---|---|---|
| Wan_R1_General_5B | π€ HuggingFace | New! Full LoRA fine-tuned on all VR-Bench tasks. |
| Wan_R1_3d_maze_5B | π€ HuggingFace | Fine-tuned LoRA for Maze3D tasks (easy, medium, and hard) from the base model Wan2.2-TI2V-5B. |
| Wan_R1_irregular_maze_5B | π€ HuggingFace | Fine-tuned LoRA for PathFinder tasks (easy, medium, and hard) from base model Wan2.2-TI2V-5B. |
| Wan_R1_regular_maze_5B | π€ HuggingFace | Fine-tuned LoRA for Maze tasks (easy, medium, and hard) from base model Wan2.2-TI2V-5B. |
| Wan_R1_sokoban_5B | π€ HuggingFace | Fine-tuned LoRA for Sokoban tasks (easy, medium, and hard) from base model Wan2.2-TI2V-5B. |
| Wan_R1_trapfield_5B | π€ HuggingFace | Fine-tuned LoRA for TrapField tasks (easy, medium, and hard) from base model Wan2.2-TI2V-5B. |
If you use this model or the VR-Bench dataset in your work, please cite:
@misc{yang2025reasoningvideoevaluationvideo,
title={Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks},
author={Cheng Yang and Haiyuan Wan and Yiran Peng and Xin Cheng and Zhaoyang Yu and Jiayi Zhang and Junchi Yu and Xinlei Yu and Xiawu Zheng and Dongzhan Zhou and Chenglin Wu},
year={2025},
eprint={2511.15065},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.15065},
}
Base model
Wan-AI/Wan2.2-TI2V-5B