view article Article Distribution Matching Prevents Mode Collapse in Training Reasoning Models germank • Mar 17 • 2