f-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment Paper • 2602.05946 • Published 11 days ago