You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ReCAP-8B

ReCAP-8B is a vision-language model fine-tuned from
Qwen/Qwen3-VL-8B-Thinking, designed to enable robust CAPTCHA solving within native GUI agents while preserving general GUI interaction capabilities.

This model is introduced in β€œCAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training”.


πŸš€ Overview

ReCAP-8B extends a general-purpose GUI agent with CAPTCHA-solving ability by learning from structured reasoning-action trajectories.

It operates end-to-end:

  • Input: raw screenshots
  • Output: reasoning + executable GUI actions (click, type, drag)

✨ Key Features

  • Unified agent: Handles both CAPTCHA and general GUI tasks
  • Reasoning-action modeling: Learns both decisions and execution
  • Self-correction: Improves robustness by learning from failures
  • Efficient interaction: Generates multiple actions per step

🧠 Capabilities

Supports diverse CAPTCHA types:

  • Text / OCR
  • Icon selection & matching
  • Image grid reasoning
  • Slider / drag tasks
  • Multi-step interaction challenges

Core skills:

  • Visual understanding
  • Spatial reasoning
  • Continuous control
  • Multi-step planning

πŸ“Š Performance

  • ~71.9% success rate on synthetic CAPTCHA benchmark
  • Strong improvements on interaction-heavy tasks (e.g., slider, image grid)
  • Maintains competitive performance on general GUI benchmarks

πŸ”’ Ethical Considerations

This model is released for research purposes only.
It is intended to study and improve the robustness of human-verification systems, not to bypass them.

Downloads last month
9
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ReCAP-Agent/ReCAP-8B

Finetuned
(40)
this model

Collection including ReCAP-Agent/ReCAP-8B