Aligned TinyLlama on UltraFeedback (fixed-1k prompt pool)

This model was aligned with TRL PPO using a reward model:

  • payelb/UltraFeedback_openbmb_deberta_1k_fixed_MARS (tag: mars)

Key settings:

  • Prompt pool: restricted to the same fixed/selected 1k subset used for RM training (loaded from CSV)
  • PPO updates: 200
  • batch size: 4
  • lr: 1e-05
  • LoRA: r=16, alpha=32, dropout=0.05
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for payelb/aligned_tinyllama_ultrafeedback_fixed1k_mars

Adapter
(1415)
this model