Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 2 days ago • 65
Running on CPU Upgrade Featured 3.03k The Smol Training Playbook 📚 3.03k The secrets to building world-class LLMs
facebook/webssl-dino300m-full2b-224 Image Feature Extraction • 0.3B • Updated Apr 24, 2025 • 3.03k • 11
Scale RAE Collection Collection for "Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders" • 7 items • Updated Feb 2 • 3
RAE Collection Collection for Diffusion Transformers with Representation Autoencoders • 7 items • Updated 11 days ago • 11
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows Paper • 2510.03506 • Published Oct 3, 2025 • 15
Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training Paper • 2509.26625 • Published Sep 30, 2025 • 43
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 192
Cosmos-Tokenize1 Collection A suite of image and video tokenizers • 8 items • Updated 2 days ago • 11
facebook/webssl-dino7b-full8b-518 Image Feature Extraction • 6B • Updated Apr 24, 2025 • 15 • 12