arxiv:2602.22190

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

Published on Feb 25

· Submitted by

Rui Yang on Feb 26

UIUC ScaleML Lab

Upvote

Authors:

Abstract

GUI-Libra addresses limitations in open-source GUI agents through specialized training methods that improve reasoning-grounding alignment and reinforcement learning under partial verifiability, demonstrating enhanced task completion across web and mobile platforms.

AI-generated summary

Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI agents. We identify two fundamental issues in these pipelines: (i) standard SFT with CoT reasoning often hurts grounding, and (ii) step-wise RLVR-tyle training faces partial verifiability, where multiple actions can be correct but only a single demonstrated action is used for verification. This makes offline step-wise metrics weak predictors of online task success. In this work, we present GUI-Libra, a tailored training recipe that addresses these challenges. First, to mitigate the scarcity of action-aligned reasoning data, we introduce a data construction and filtering pipeline and release a curated 81K GUI reasoning dataset. Second, to reconcile reasoning with grounding, we propose action-aware SFT that mixes reasoning-then-action and direct-action data and reweights tokens to emphasize action and grounding. Third, to stabilize RL under partial verifiability, we identify the overlooked importance of KL regularization in RLVR and show that a KL trust region is critical for improving offline-to-online predictability; we further introduce success-adaptive scaling to downweight unreliable negative gradients. Across diverse web and mobile benchmarks, GUI-Libra consistently improves both step-wise accuracy and end-to-end task completion. Our results suggest that carefully designed post-training and data curation can unlock significantly stronger task-solving capabilities without costly online data collection. We release our dataset, code, and models to facilitate further research on data-efficient post-training for reasoning-capable GUI agents.

View arXiv page View PDF Project page GitHub 12 Add to collection

Community

Ray2333

Paper submitter 1 day ago

This paper addresses two core bottlenecks in post-training native GUI agents: (1) the lack of high-quality, action-aligned reasoning data, and (2) generic post-training pipelines that overlook the unique constraints of GUI interaction. GUI-Libra mitigates these issues with a GUI-specific recipe: an 81K curated reasoning dataset, action-aware SFT that balances reasoning and direct-action supervision via token reweighting, and conservative RL combining KL regularization with success-adaptive negative gradient scaling. Across both offline and online benchmarks (including AndroidWorld, Online-Mind2Web, and WebArena-Lite-v2), GUI-Libra-4B and GUI-Libra-8B consistently outperform strong native GUI agents, and in several settings even surpass proprietary systems.