Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published 10 days ago • 49
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation Paper • 2512.24724 • Published 9 days ago • 6
Pretraining Frame Preservation in Autoregressive Video Memory Compression Paper • 2512.23851 • Published 10 days ago • 22
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation Paper • 2512.24551 • Published 9 days ago • 18
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published 12 days ago • 18
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems Paper • 2512.24385 • Published 9 days ago • 7
Factorized Learning for Temporally Grounded Video-Language Models Paper • 2512.24097 • Published 10 days ago • 6
SurgWorld: Learning Surgical Robot Policies from Videos via World Modeling Paper • 2512.23162 • Published 11 days ago • 9
Video-BrowseComp: Benchmarking Agentic Video Research on Open Web Paper • 2512.23044 • Published 11 days ago • 9
Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation Paper • 2512.21734 • Published 15 days ago • 4