Benchmarking Scientific Understanding and Reasoning for Video Generation using VideoScience-Bench Paper • 2512.02942 • Published Dec 2, 2025 • 5
AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications Paper • 2602.22769 • Published 25 days ago • 9