Multi-Task GRPO: Reliable LLM Reasoning Across Tasks Paper • 2602.05547 • Published 1 day ago • 6 • 5
Meta-Okapi/ca_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated 11 days ago
Meta-Okapi/ca_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated 11 days ago
Meta-Okapi/ro_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated 11 days ago
Meta-Okapi/ro_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated 11 days ago
Meta-Okapi/fr_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated 11 days ago
Meta-Okapi/fr_bloom7b1_adaptdpo_tdata100_lora_2msteps_200steps_batch20_gradacc2_200steps Updated 11 days ago