Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors Paper • 2601.15625 • Published 16 days ago • 8
Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification Paper • 2601.15808 • Published 16 days ago • 20
Ray121381/eveo_anchor_advantage_independent-qwen2.5-7b-sciworld-self-sum-self-gen-maxlen-2048 Updated Dec 23, 2025
Ray121381/eveo_anchor_advantage_independent-qwen2.5-7b-sciworld-self-sum-self-gen-maxlen-2048 Updated Dec 23, 2025