LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 6 days ago • 128
SkillEvolBench: Benchmarking the Evolution from Episodic Experience to Procedural Skills Paper • 2605.24117 • Published 10 days ago • 20
view article Article NEO-unify: Building Native Multimodal Unified Models End to End sensenova • Mar 5 • 163
SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer Paper • 2605.15178 • Published 18 days ago • 84
Lightning Unified Video Editing via In-Context Sparse Attention Paper • 2605.04569 • Published 26 days ago • 18
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published 28 days ago • 124
UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors Paper • 2605.00658 • Published May 1 • 84
Elucidating the SNR-t Bias of Diffusion Probabilistic Models Paper • 2604.16044 • Published Apr 17 • 73
LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories Paper • 2604.15311 • Published Apr 16 • 13
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published Apr 13 • 72
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 99
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference Paper • 2602.21548 • Published Feb 25 • 53
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception Paper • 2602.11858 • Published Feb 12 • 64
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding Paper • 2602.01785 • Published Feb 2 • 97