ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation Paper • 2605.28293 • Published 3 days ago • 78
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts Paper • 2410.16077 • Published Oct 21, 2024 • 1
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment Paper • 2605.19577 • Published 11 days ago • 58
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning Paper • 2512.05591 • Published Dec 5, 2025 • 17
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Oct 7, 2025 • 67