Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling Paper • 2606.03102 • Published 10 days ago • 14
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories Paper • 2605.21468 • Published 23 days ago • 50