Hide-and-Seek in Trajectories: Discovering Failure Signals for VLA Runtime Monitoring
Abstract
Hide-and-Seek framework detects robot execution failures in vision-language-action models by localizing failure-indicative actions through contrastive learning from trajectory-level supervision without step-level annotations.
Vision-Language-Action (VLA) models enable robots to follow natural language instructions and generalize across diverse tasks, but they remain vulnerable to execution failures that compromise reliability in real-world deployment. Detecting such failures during execution is therefore critical for the robust deployment of embodied systems. Existing failure detection methods either rely on expensive action resampling or external models, while alternatives propagate trajectory-level labels uniformly across every timestep, obscuring localized failure signals. In this paper, we propose Hide-and-Seek, a framework that formulates VLA failure detection as a coarsely supervised learning problem. By combining inter-trajectory and intra-trajectory contrastive objectives, Hide-and-Seek localizes failure-indicative actions and induces temporally structured failure signals from trajectory-level supervision alone, without any step-level annotation. We evaluate Hide-and-Seek on LIBERO, VLABench, and a real-world robotic platform across three representative VLA policies: OpenVLA, π_0, and π_{0.5}.Our method achieves state-of-the-art multi-task failure detection performance with a practical accuracy--timeliness trade-off under conformal prediction, and generalizes well to both seen and unseen tasks.
Community
Hide-and-Seek reformulates VLA failure detection as a coarsely supervised problem, using inter- and intra-trajectory contrastive objectives to induce temporally structured failure signals from trajectory-level labels only.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VLAConf: Calibrated Task-Success Confidence for Vision-Language-Action Models (2026)
- Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning (2026)
- ReconVLA: An Uncertainty-Guided and Failure-Aware Vision-Language-Action Framework for Robotic Control (2026)
- Failure Identification in Imitation Learning Via Statistical and Semantic Filtering (2026)
- Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models (2026)
- Pre-VLA: Preemptive Runtime Verification for Reliable Vision-Language-Action and World-Model Rollouts (2026)
- Dynamic Execution Commitment of Vision-Language-Action Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.30834 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper