self-paly
updated
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
•
2505.03335
•
Published
•
188
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
73
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains
RLVR
Paper
•
2508.14029
•
Published
•
118
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified
Self-Play
Paper
•
2509.25541
•
Published
•
140
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model
Reasoning
Paper
•
2509.19894
•
Published
•
33
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Paper
•
2504.19162
•
Published
•
18
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via
Multi-Agent Multi-Turn Reinforcement Learning
Paper
•
2506.24119
•
Published
•
50
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced
Self-Play
Paper
•
2509.24193
•
Published
•
6
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
47
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper
•
2402.03620
•
Published
•
117