iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 21 days ago • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 14 days ago • 7
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 21 days ago • 3 • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 14 days ago • 7 • 3
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 21 days ago • 3
iOSWorld: A Benchmark for Personally Intelligent Phone Agents Paper • 2606.09764 • Published 21 days ago • 3
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 14 days ago • 7
MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents Paper • 2606.16748 • Published 14 days ago • 7
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks Paper • 2412.14161 • Published Dec 18, 2024 • 51
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks Paper • 2410.19100 • Published Oct 24, 2024 • 6