Dr. Bench: A Multidimensional Evaluation for Deep Research Agents, from Answers to Reports Paper • 2510.02190 • Published Jan 29 • 19
Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning Paper • 2509.22824 • Published Sep 26, 2025 • 21
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 79
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published Sep 26, 2025 • 26