Forecasting Downstream Performance of LLMs With Proxy Metrics Paper • 2605.18607 • Published 21 days ago • 14
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published May 7 • 233
KinDER: A Physical Reasoning Benchmark for Robot Learning and Planning Paper • 2604.25788 • Published May 4 • 2
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation Paper • 2604.28196 • Published Apr 30 • 72
Leveraging Verifier-Based Reinforcement Learning in Image Editing Paper • 2604.27505 • Published Apr 30 • 57
ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation Paper • 2604.23099 • Published Apr 25 • 4
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published Apr 22 • 243
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published Apr 6 • 203