On-Policy Self-Distillation for Reasoning Compression Paper • 2603.05433 • Published about 12 hours ago • 1
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published 9 days ago • 5
Overconfident Errors Need Stronger Correction: Asymmetric Confidence Penalties for Reinforcement Learning Paper • 2602.21420 • Published 9 days ago • 5