declare-lab/rq-bench
Updated • 1.31k
Natural Language Processing
GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards
Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics