FrontiersMind/Nandi-Mini-150M-Tool-Calling Text Generation • 0.2B • Updated 24 days ago • 23.5k • 51
FrontiersMind/Nandi-Mini-150M-Instruct Text Generation • 0.2B • Updated 24 days ago • 31.3k • 50
FrontiersMind/Nandi-Mini-150M Text Generation • 0.2B • Updated about 18 hours ago • 17.1k • 135
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Text Generation • Updated Oct 15, 2025 • 1.44k • • 347
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 89
ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper • 2504.02507 • Published Apr 3, 2025 • 89
A Refined Analysis of Massive Activations in LLMs Paper • 2503.22329 • Published Mar 28, 2025 • 14
A Refined Analysis of Massive Activations in LLMs Paper • 2503.22329 • Published Mar 28, 2025 • 14
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published Mar 21, 2025 • 5
Variance Control via Weight Rescaling in LLM Pre-training Paper • 2503.17500 • Published Mar 21, 2025 • 5
Running 3.84k The Ultra-Scale Playbook 🌌 3.84k The ultimate guide to training LLM on large GPU Clusters
view article Article Falcon 2: An 11B parameter pretrained language model and VLM, trained on over 5000B tokens and 11 languages +7 Quent-01, nilabhra, rcojocaru, Mughaira, gcampesan, SanathNarayan, griffintaur, clefourrier, SaylorTwift • May 24, 2024 • 28