The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement Paper • 2605.30888 • Published 10 days ago • 10
Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation Paper • 2605.29861 • Published 11 days ago • 16
SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue Paper • 2605.30993 • Published 10 days ago • 56
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 12 days ago • 420
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 27 days ago • 195
AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration Paper • 2605.20025 • Published 20 days ago • 186
DiffusionOPD: A Unified Perspective of On-Policy Distillation in Diffusion Models Paper • 2605.15055 • Published 25 days ago • 19