PLaID++: A Preference Aligned Language Model for Targeted Inorganic Materials Design Paper • 2509.07150 • Published Sep 8, 2025
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7, 2025 • 274
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18, 2025 • 144