view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 69
view article Article You could have designed state of the art positional encoding Nov 25, 2024 • 455
Running on CPU Upgrade Featured 3.06k The Smol Training Playbook 📚 3.06k The secrets to building world-class LLMs
Running 3.76k The Ultra-Scale Playbook 🌌 3.76k The ultimate guide to training LLM on large GPU Clusters
deepseek-ai/DeepSeek-V3-0324 Text Generation • 685B • Updated Mar 27, 2025 • 377k • • 3.09k
nvidia/Llama-Nemotron-Post-Training-Dataset Viewer • Updated May 8, 2025 • 3.91M • 4.8k • 646