Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents
Abstract
LearnWeak is an annotation-free framework that enhances small computer-use agents by identifying weaknesses through a stronger reference agent and generating targeted training data for improved domain specialization.
Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.
Community
We introduce LearnWeak, an automated training framework for domain specialization of small computer-use agents (CUAs) that
- requires no human trajectory annotation,
- constructs synthetic training datasets focused on the student’s weaknesses, and
- trains the student using DPO with adaptive loss selection based on error types.
On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Terminal-World: Scaling Terminal-Agent Environments via Agent Skills (2026)
- IntentScore: Intent-Conditioned Action Evaluation for Computer-Use Agents (2026)
- TRACE: Capability-Targeted Agentic Training (2026)
- Structured Distillation of Web Agent Capabilities Enables Generalization (2026)
- SEAL: Synergistic Co-Evolution of Agents and Learning Environments (2026)
- Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection (2026)
- SkillOpt: Executive Strategy for Self-Evolving Agent Skills (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.28775 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
