Papers + RL/Reasoning
updated
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
144
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
26
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
•
2504.08600
•
Published
•
32
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
•
2504.11343
•
Published
•
19
OTC: Optimal Tool Calls via Reinforcement Learning
Paper
•
2504.14870
•
Published
•
35
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large
Language Models
Paper
•
2504.15716
•
Published
•
12
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
•
2504.21776
•
Published
•
59
DeepCritic: Deliberate Critique with Large Language Models
Paper
•
2505.00662
•
Published
•
53
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
•
2505.07608
•
Published
•
82
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
•
2505.09343
•
Published
•
74
CPGD: Toward Stable Rule-based Reinforcement Learning for Language
Models
Paper
•
2505.12504
•
Published
•
24
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via
Reinforcement Learning
Paper
•
2505.11896
•
Published
•
58
Paper
•
2505.14674
•
Published
•
37
One-RL-to-See-Them-All/Orsta-Data-47k
Updated
•
429
•
17
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper
•
2505.18129
•
Published
•
61
RL with KL penalties is better viewed as Bayesian inference
Paper
•
2205.11275
•
Published
•
1
Asymptotics of Language Model Alignment
Paper
•
2404.01730
•
Published
•
1
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied
Iterative Policy Optimization
Paper
•
2505.19000
•
Published
•
42
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous
Concept Space
Paper
•
2505.15778
•
Published
•
18
ZeroGUI: Automating Online GUI Learning at Zero Human Cost
Paper
•
2505.23762
•
Published
•
45
Table-R1: Inference-Time Scaling for Table Reasoning
Paper
•
2505.23621
•
Published
•
93
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
Comment on The Illusion of Thinking: Understanding the Strengths and
Limitations of Reasoning Models via the Lens of Problem Complexity
Paper
•
2506.09250
•
Published
•
27
Paper
•
2506.10910
•
Published
•
66
Does Math Reasoning Improve General LLM Capabilities? Understanding
Transferability of LLM Reasoning
Paper
•
2507.00432
•
Published
•
79
AutoTriton: Automatic Triton Programming with Reinforcement Learning in
LLMs
Paper
•
2507.05687
•
Published
•
27
Reasoning or Memorization? Unreliable Results of Reinforcement Learning
Due to Data Contamination
Paper
•
2507.10532
•
Published
•
89
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning
Systems in LLMs
Paper
•
2507.09477
•
Published
•
86
osmosis-ai/Osmosis-Apply-1.7B
Text Generation
•
2B
•
Updated
•
68
•
91
Geometric-Mean Policy Optimization
Paper
•
2507.20673
•
Published
•
31
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper
•
2508.06471
•
Published
•
195
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
•
2508.08221
•
Published
•
50
Training-Free Group Relative Policy Optimization
Paper
•
2510.08191
•
Published
•
44
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
•
2510.13786
•
Published
•
31
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper
•
2512.01374
•
Published
•
95