Dataset MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation Paper • 2604.23789 • Published 12 days ago • 5 OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments Paper • 2605.18758 • Published Apr 3 • 12
MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation Paper • 2604.23789 • Published 12 days ago • 5
OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments Paper • 2605.18758 • Published Apr 3 • 12
benchmarks ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 24 days ago • 67 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis Paper • 2604.24198 • Published 24 days ago • 22 KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels Paper • 2605.04956 • Published 15 days ago • 7 Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published 8 days ago • 32
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 24 days ago • 67
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis Paper • 2604.24198 • Published 24 days ago • 22
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels Paper • 2605.04956 • Published 15 days ago • 7
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published 8 days ago • 32
Dataset MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation Paper • 2604.23789 • Published 12 days ago • 5 OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments Paper • 2605.18758 • Published Apr 3 • 12
MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation Paper • 2604.23789 • Published 12 days ago • 5
OmniGUI: Benchmarking GUI Agents in Omni-Modal Smartphone Environments Paper • 2605.18758 • Published Apr 3 • 12
benchmarks ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 24 days ago • 67 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis Paper • 2604.24198 • Published 24 days ago • 22 KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels Paper • 2605.04956 • Published 15 days ago • 7 Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published 8 days ago • 32
ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning Paper • 2604.24300 • Published 24 days ago • 67
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis Paper • 2604.24198 • Published 24 days ago • 22
KernelBench-X: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels Paper • 2605.04956 • Published 15 days ago • 7
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling Paper • 2605.13062 • Published 8 days ago • 32