InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Yuchen Yan^1,2,*, Liang Jiang², Jin Jiang³, Shuaicheng Li²,
Zujie Wen², Zhiqiang Zhang², Jun Zhou², Jian Shao^1,†, Yueting Zhaung¹, Yongliang Shen^1,†

¹Zhejiang University, ²Ant Group, ³Peking University
ICML 2026
^*Contribution during internship at Ant Group. ^†Corresponding Author

Arxiv | WebPage

News 🔥🔥

2026.05.01: InftyThink+ has been accpeted by ICML 2026, see you in Souel.
2026.02.09: We release our paper.

Overview 🦾🦾

Building upon our previous work InftyThink, we introduce InftyThink+, an end-to-end reinforcement learning framework that directly optimizes the complete iterative reasoning trajectory. Building on InftyThink’s paradigm of model-controlled iteration boundaries and explicit summarization, our approach proceeds in two stages: a cold-start stage that uses supervised fine-tuning to establish the basic iterative reasoning format, followed by an RL stage that optimizes strategic decisions through trajectory-level learning. We carefully design the rollout strategy, reward formulation, and policy gradient estimation tailored to InftyThink’s single-trajectory, multi-inference structure. This design separates format acquisition from strategy optimization, enabling the model to learn not only how to produce iterative reasoning, but also when to summarize, what to preserve, and how to effectively leverage self-generated summaries across iterations.

Citation

If you find our work helpful, feel free to give us a cite.

@misc{yan2026inftythinkplus,
      title={InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning}, 
      author={Yuchen Yan and Liang Jiang and Jin Jiang and Shuaicheng Li and Zujie Wen and Zhiqiang Zhang and Jun Zhou and Jian Shao and Yueting Zhuang and Yongliang Shen},
      year={2026},
      eprint={2602.06960},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.06960}, 
}

Contact Us

If you have any questions, please contact us by email: yanyuchen@zju.edu.cn

Downloads last month: 15

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yanyc/InftyThink-Plus-TE-1.5B

Quantizations

2 models

Papers for yanyc/InftyThink-Plus-TE-1.5B

InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning

Paper • 2602.06960 • Published Feb 6 • 14

InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models

Paper • 2503.06692 • Published Mar 9, 2025 • 2