One-Token Verification for Reasoning Correctness Estimation
Abstract
One-Token Verification (OTV) enables efficient mathematical reasoning by estimating correctness in a single forward pass during generation, reducing computational costs through correctness-guided early termination.
Recent breakthroughs in large language models (LLMs) have led to notable successes in complex reasoning tasks, such as mathematical problem solving. A common strategy for improving performance is parallel thinking, in which multiple reasoning traces are generated and the final prediction is made using aggregation schemes like majority voting or best-of-N decoding. However, two key challenges persist. First, multi-sample decoding incurs substantial inference latency, especially for long-form outputs. Second, effective mechanisms for reliably assessing the correctness of individual reasoning traces are still limited. To address these challenges, we introduce One-Token Verification (OTV), a computational method that estimates reasoning correctness in a single forward pass during generation. OTV is activated by a learnable token and integrated into the LLM via low-rank adaptation to probe internal reasoning signals through the key-value cache, supporting token-level correctness estimation at any stage of generation without disrupting primary reasoning. Experiments on mathematical reasoning benchmarks demonstrate that OTV consistently surpasses existing verifiers. Additionally, OTV reduces token usage by up to 90% through correctness-guided early termination, prioritizing shorter, more reliable solutions.
Get this paper in your agent:
hf papers read 2603.01025 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper