GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification Paper โข 2604.14258 โข Published 14 days ago โข 23