DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 11 days ago • 204
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 19 days ago • 195
HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution Paper • 2605.09942 • Published 20 days ago • 15
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published about 1 month ago • 217
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU Paper • 2604.05091 • Published Apr 6 • 47