MC#: Mixture Compressor for Mixture-of-Experts Large Models Paper • 2510.10962 • Published Oct 13, 2025
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 15 days ago • 49
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 15 days ago • 49
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer Paper • 2507.04947 • Published Jul 7, 2025 • 1
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper • 2512.17260 • Published 19 days ago • 48
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Paper • 2512.10927 • Published 27 days ago • 5
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper • 2504.13161 • Published Apr 17, 2025 • 93
NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper • 2412.04453 • Published Dec 5, 2024
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Paper • 2507.12440 • Published Jul 16, 2025
Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations Paper • 2508.18132 • Published Aug 25, 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13, 2025 • 176
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17, 2025 • 89
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning Paper • 2510.15110 • Published Oct 16, 2025 • 15
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models Paper • 2406.01584 • Published Jun 3, 2024