UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization
Abstract
UI-Copilot is a collaborative framework that enhances GUI agents by decoupling memory management and integrating on-demand tool assistance for improved performance in complex user interface tasks.
MLLM-based GUI agents have demonstrated strong capabilities in complex user interface interaction tasks. However, long-horizon scenarios remain challenging, as these agents are burdened with tasks beyond their intrinsic capabilities, suffering from memory degradation, progress confusion, and math hallucination. To address these challenges, we present UI-Copilot, a collaborative framework where the GUI agent focuses on task execution while a lightweight copilot provides on-demand assistance for memory retrieval and numerical computation. We introduce memory decoupling to separate persistent observations from transient execution context, and train the policy agent to selectively invoke the copilot as Retriever or Calculator based on task demands. To enable effective tool invocation learning, we propose Tool-Integrated Policy Optimization (TIPO), which separately optimizes tool selection through single-turn prediction and task execution through on-policy multi-turn rollouts. Experimental results show that UI-Copilot-7B achieves state-of-the-art performance on challenging MemGUI-Bench, outperforming strong 7B-scale GUI agents such as GUI-Owl-7B and UI-TARS-1.5-7B. Moreover, UI-Copilot-7B delivers a 17.1% absolute improvement on AndroidWorld over the base Qwen model, highlighting UI-Copilot's strong generalization to real-world GUI tasks.
Community
We propose UI-Copilot, a collaborative framework where the GUI agent selectively invokes a lightweight copilot for memory retrieval and numerical computation, enabling efficient long-horizon GUI navigation.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience (2026)
- ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents (2026)
- Anticipatory Planning for Multimodal AI Agents (2026)
- SecAgent: Efficient Mobile GUI Agent with Semantic Context (2026)
- K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control (2026)
- Generalization in Online Reinforcement Learning for Mobile Agents (2026)
- SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper