Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model Paper • 2603.21986 • Published Mar 23 • 124
Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Paper • 2604.08995 • Published 14 days ago • 48
HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds Paper • 2604.14268 • Published 9 days ago • 109
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation Paper • 2604.19636 • Published 3 days ago • 79
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 15 days ago • 240
ELT: Elastic Looped Transformers for Visual Generation Paper • 2604.09168 • Published 14 days ago • 19
view article Article The First Healthcare Robotics Dataset and Foundational Physical AI Models for Healthcare Robotics Mar 16 • 29
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published Feb 15 • 53
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published Dec 18, 2025 • 76
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published Dec 15, 2025 • 76
RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards Paper • 2512.00473 • Published Nov 29, 2025 • 27
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models Paper • 2512.08829 • Published Dec 9, 2025 • 21
OmniPSD: Layered PSD Generation with Diffusion Transformer Paper • 2512.09247 • Published Dec 10, 2025 • 51
Composing Concepts from Images and Videos via Concept-prompt Binding Paper • 2512.09824 • Published Dec 10, 2025 • 28
StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation Paper • 2512.09363 • Published Dec 10, 2025 • 74
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality Paper • 2512.07951 • Published Dec 8, 2025 • 51
Light-X: Generative 4D Video Rendering with Camera and Illumination Control Paper • 2512.05115 • Published Dec 4, 2025 • 11