TROJail: Trajectory-Level Optimization for Multi-Turn Large Language Model Jailbreaks with Process Rewards Paper • 2512.07761 • Published 12 days ago • 1
Quantile Advantage Estimation for Entropy-Safe Reasoning Paper • 2509.22611 • Published Sep 26, 2025 • 120