TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders Paper • 2606.09323 • Published 4 days ago • 48
TRL-Bench Collection TRL-Bench: cross-paradigm representation-level evaluation of tabular encoders. CTbench + Rbench + DLTE. • 4 items • Updated May 6 • 4
Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 3 days ago • 37
QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents Paper • 2605.27068 • Published 17 days ago • 24
Diversed Model Discovery via Structured Table Discovery Paper • 2605.22766 • Published 22 days ago • 6
UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding Paper • 2604.14113 • Published Apr 15 • 10
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 122
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published Apr 8 • 96
ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement Paper • 2604.01591 • Published Apr 2 • 42
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery Paper • 2604.01658 • Published Apr 2 • 55
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 98
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings Paper • 2603.13594 • Published Mar 13 • 149
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published Feb 13 • 62
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published Feb 10 • 201
ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands Paper • 2512.24965 • Published Dec 31, 2025 • 43
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Paper • 2512.13281 • Published Dec 15, 2025 • 65
Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published Nov 19, 2025 • 54
UI Agent Collection a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics • 492 items • Updated 2 days ago • 69