IAM: Identity-Aware Human Motion and Shape Joint Generation Paper • 2604.25164 • Published 1 day ago • 1
DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios Paper • 2604.25914 • Published 1 day ago • 2
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 6 days ago • 22
Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data Paper • 2604.24479 • Published 2 days ago
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 2 days ago • 47
Stochastic KV Routing: Enabling Adaptive Depth-Wise Cache Sharing Paper • 2604.22782 • Published 26 days ago • 1
ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Paper • 2604.23099 • Published 4 days ago • 2
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published 6 days ago • 22
dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model Paper • 2604.22152 • Published 5 days ago • 2
AgentSearchBench: A Benchmark for AI Agent Search in the Wild Paper • 2604.22436 • Published 5 days ago • 10
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 5 days ago • 202
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published 6 days ago • 36
Seeing Fast and Slow: Learning the Flow of Time in Videos Paper • 2604.21931 • Published 6 days ago • 18
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 7 days ago • 233
MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings Paper • 2604.19902 • Published 8 days ago • 2