Anish13/qwen3_8b_action_rl_lora_r64_a32_d0.05_lr9e-6_bsz1_ga8_g2_epochs10_seed42_ddp4_vllm-check-570 Text Generation • Updated 8 days ago • 37
Anish13/qwen3_8b_action_rl_lora_r64_a32_d0.05_lr9e-6_bsz1_ga8_g2_epochs10_seed42_ddp4_vllm-check-570 Text Generation • Updated 8 days ago • 37
Anish13/e10_qwen3_8b_lang_rl_stage2_lora_r64_a32_d0.05_lr9e-6_bsz2_ga4_g8_epochs10_seed42_ddp4_vllm Text Generation • Updated 20 days ago • 55
Anish13/e10_qwen3_8b_lang_rl_stage2_lora_r64_a32_d0.05_lr9e-6_bsz2_ga4_g8_epochs10_seed42_ddp4_vllm Text Generation • Updated 20 days ago • 55
Anish13/e4_web_arbiter_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_e10 Text Generation • Updated 23 days ago • 26
Anish13/e4_web_arbiter_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_e10 Text Generation • Updated 23 days ago • 26
Anish13/e6_web_arbiter_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_e10 Text Generation • Updated 24 days ago • 34
Anish13/e6_web_arbiter_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_e10 Text Generation • Updated 24 days ago • 34
Anish13/e7_web_arbiter_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_e10 Text Generation • Updated 24 days ago • 32
Anish13/e7_web_arbiter_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_e10 Text Generation • Updated 24 days ago • 32
Anish13/e5-qwen3_8b_lang_rl_stage2_lora_r64_a32_d0.05_lr9e-6_bsz2_ga4_g8_epochs6_seed42_ddp4_vllm Text Generation • Updated 24 days ago • 40
Anish13/e5-qwen3_8b_lang_rl_stage2_lora_r64_a32_d0.05_lr9e-6_bsz2_ga4_g8_epochs6_seed42_ddp4_vllm Text Generation • Updated 24 days ago • 40
Anish13/e4_web_arb_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_epochs10_seed42 Text Generation • Updated 24 days ago • 34
Anish13/e4_web_arb_rl_web-wmrm-ep2-warm-start_lora_r32_a32_lr7e-6_bsz1_ga8_g8_lam0.2_epochs10_seed42 Text Generation • Updated 24 days ago • 34
Anish13/epoch-2-qwen3-8b-lang-rl-stage2-lora-checkpoint-3520 Text Generation • Updated 27 days ago • 108
Anish13/epoch-2-qwen3-8b-lang-rl-stage2-lora-checkpoint-3520 Text Generation • Updated 27 days ago • 108