-
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Paper • 2605.30611 • Published • 247 -
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems
Paper • 2606.01779 • Published • 6 -
Text-to-Image Models Need Less from Text Encoders Than You Think
Paper • 2606.03715 • Published • 11 -
SIA: Self Improving AI with Harness & Weight Updates
Paper • 2605.27276 • Published • 15
JiayuCHEN
KN33SOXXX
AI & ML interests
None yet
Recent Activity
updated a collection about 21 hours ago
GUIAgent updated a collection 1 day ago
agent updated a collection 3 days ago
WorldModelOrganizations
benchmark
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
Paper • 2605.25874 • Published • 103 -
Agents' Last Exam
Paper • 2606.05405 • Published • 360 -
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Paper • 2606.09426 • Published • 102 -
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Paper • 2606.07591 • Published • 95
harness
-
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs
Paper • 2605.30611 • Published • 247 -
HarnessForge: Joint Harness and Policy Evolution for Adaptive Agent Systems
Paper • 2606.01779 • Published • 6 -
Text-to-Image Models Need Less from Text Encoders Than You Think
Paper • 2606.03715 • Published • 11 -
SIA: Self Improving AI with Harness & Weight Updates
Paper • 2605.27276 • Published • 15
benchmark
-
WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
Paper • 2605.25874 • Published • 103 -
Agents' Last Exam
Paper • 2606.05405 • Published • 360 -
WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces
Paper • 2606.09426 • Published • 102 -
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Paper • 2606.07591 • Published • 95
models 0
None public yet