agurung/flawed-fictions-qwen25-7b-lengthpenalty-litereason Reinforcement Learning • 8B • Updated 1 day ago • 74
agurung/flawed-fictions-qwen25-7b-lengthpenalty Reinforcement Learning • 8B • Updated 3 days ago • 157
agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo Text Generation • 8B • Updated 17 days ago • 449
agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e3_bptt_offset Text Generation • 8B • Updated 18 days ago • 15
agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated 18 days ago • 23