Just trained a 2B coding model to rank candidate AI/ML research ideas against the implicit preferences in a code repository's merge history.
The training data comes from a Gaussian Process fit on the accumulated dispositions in VQASynth, where each PR against a deployed project yields a pairwise comparison between the feature branch preferred and the baseline at main.
The GP scores candidate papers to synthesize preference pairs, and DPO with LoRA bakes the ranking pipeline into the model's weights.
After 1 epoch the model reaches 87.4% reward accuracy on the held-out eval split against 92.3% on training, consistent with learning the task without overfitting.
Now, I'm scaling the pipeline to thousands of repos for a generalization test.
Weβre excited to announce that Unsloth has joined the PyTorch Ecosystem! π₯π¦₯
Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! π