Controllable Preference Optimization: Toward Controllable Multi-Objective Alignment Paper • 2402.19085 • Published Feb 29, 2024
OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction Paper • 2408.08585 • Published Aug 16, 2024
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding Paper • 2509.06923 • Published Sep 8, 2025 • 22
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization Paper • 2502.05605 • Published Feb 8, 2025
Learning to Focus: Causal Attention Distillation via Gradient-Guided Token Pruning Paper • 2506.07851 • Published Jun 9, 2025
Less Noise, More Voice: Reinforcement Learning for Reasoning via Instruction Purification Paper • 2601.21244 • Published 14 days ago • 12
AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis Paper • 2602.09372 • Published 2 days ago • 4
AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis Paper • 2602.09372 • Published 2 days ago • 4
Cog-Rethinker: Hierarchical Metacognitive Reinforcement Learning for LLM Reasoning Paper • 2510.15979 • Published Oct 13, 2025
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs Paper • 2510.01037 • Published Oct 1, 2025 • 2
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs Paper • 2510.01037 • Published Oct 1, 2025 • 2
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs Paper • 2510.01037 • Published Oct 1, 2025 • 2 • 2
Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding Paper • 2509.06923 • Published Sep 8, 2025 • 22
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31, 2025 • 54
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31, 2025 • 54