GenEvolve: Self-Evolving Image Generation Agents via Tool-Orchestrated Visual Experience Distillation Paper • 2605.21605 • Published 23 days ago • 13
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published about 1 month ago • 50
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 17 • 7
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 17
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published Apr 14 • 17 • 7
A Survey of On-Policy Distillation for Large Language Models Paper • 2604.00626 • Published Apr 1 • 13
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 78
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning Paper • 2601.09088 • Published Jan 14 • 64
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob Viewer • Updated Jan 15 • 435k • 7.89k • 62
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b-Logprob Viewer • Updated Jan 15 • 435k • 7.89k • 62
Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation Paper • 2512.20908 • Published Dec 24, 2025 • 29