None defined yet.
NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers?
EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions