Running Featured 45 Distilling 100B+ Models 40x Faster with TRL 📝 45 TRL distillation for 100B+ teachers, 40x faster
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 13 days ago • 845
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14
Marco-MoE Collection A suit of multilingual MoE models with highly-sparse architectures • 5 items • Updated 7 days ago • 14