Safetensors
qwen2

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Yuchen Yan1,2,*,   Liang Jiang2,   Jin Jiang3,   Shuaicheng Li2,  
Zujie Wen2,   Zhiqiang Zhang2,   Jun Zhou2,   Jian Shao1,†,   Yueting Zhaung1,   Yongliang Shen1,†

1Zhejiang University,   2Ant Group,   3Peking University
ICML 2026
*Contribution during internship at Ant Group. Corresponding Author

Arxiv | WebPage

News 🔥🔥

  • 2026.05.01: InftyThink+ has been accpeted by ICML 2026, see you in Souel.
  • 2026.02.09: We release our paper.

Overview 🦾🦾

Building upon our previous work InftyThink, we introduce InftyThink+, an end-to-end reinforcement learning framework that directly optimizes the complete iterative reasoning trajectory. Building on InftyThink’s paradigm of model-controlled iteration boundaries and explicit summarization, our approach proceeds in two stages: a cold-start stage that uses supervised fine-tuning to establish the basic iterative reasoning format, followed by an RL stage that optimizes strategic decisions through trajectory-level learning. We carefully design the rollout strategy, reward formulation, and policy gradient estimation tailored to InftyThink’s single-trajectory, multi-inference structure. This design separates format acquisition from strategy optimization, enabling the model to learn not only how to produce iterative reasoning, but also when to summarize, what to preserve, and how to effectively leverage self-generated summaries across iterations.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{yan2026inftythinkplus,
      title={InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning}, 
      author={Yuchen Yan and Liang Jiang and Jin Jiang and Shuaicheng Li and Zujie Wen and Zhiqiang Zhang and Jun Zhou and Jian Shao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2602.06960},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06960}, 
}

Contact Us

If you have any questions, please contact us by email: yanyuchen@zju.edu.cn

Downloads last month
15
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yanyc/InftyThink-Plus-TE-1.5B

Quantizations
2 models

Papers for yanyc/InftyThink-Plus-TE-1.5B