arxiv:2605.13511

Many-Shot CoT-ICL: Making In-Context Learning Truly Learn

Published on May 13

· Submitted by

Cindy on May 14

Authors:

Abstract

Many-shot in-context learning for reasoning tasks exhibits different scaling behaviors than non-reasoning tasks, with demonstration ordering and selection significantly impacting performance.

AI-generated summary

In-context learning (ICL) adapts large language models (LLMs) to new tasks by conditioning on demonstrations in the prompt without parameter updates. With long-context models, many-shot ICL can use dozens to hundreds of examples and achieve performance comparable to fine-tuning, yet current understanding of its scaling behavior is largely derived from non-reasoning tasks. We study many-shot chain-of-thought in-context learning (CoT-ICL) for reasoning and show that standard many-shot rules do not transfer. Across non-reasoning and reasoning-oriented LLMs and across non-reasoning and reasoning tasks, we find: (i) a setting-dependent scaling effect, where increasing the number of CoT demonstrations is unstable for non-reasoning LLMs and benefits mainly reasoning-oriented LLMs; (ii) similarity-based retrieval helps on non-reasoning tasks but fails on reasoning, since semantic similarity poorly predicts procedural (i.e., CoT) compatibility; and (iii) an order-scaling effect, where performance variance grows with more CoT demonstrations. We interpret these behaviors by viewing many-shot CoT-ICL as in-context test-time learning rather than scaled pattern matching, and suggests two principles: (i) demonstrations should be easy for the target model to understand, and (ii) they should be ordered to support a smooth conceptual progression. Guided by the principle, we propose Curvilinear Demonstration Selection (CDS), a simple ordering method that yields up to a 5.42 percentage-point gain on geometry with 64 demonstrations. Overall, our results reframe the long context window from a retrieval buffer into a structured curriculum for in-context test-time learning.

View arXiv page View PDF Add to collection

Community

ttchungc

Paper submitter 1 day ago

We believe this work provides a step toward bridging ICL from pattern matching to in-context test time learning with two principles proposed, showing that reasoning performance relies on demonstrations being both understandable to the model and smoothly sequenced to facilitate conceptual progression.

AlanRosston520

about 15 hours ago

Really impressive work! The reframing of many-shot CoT-ICL as in-context test-time learning is a compelling perspective, and the finding that similarity-based retrieval actually hurts reasoning tasks is quite eye-opening. CDS is also a elegantly simple yet effective solution.

May I ask if you plan to open-source the code and experimental scripts? Would love to reproduce and build on this.

ttchungc

about 14 hours ago

•

edited about 14 hours ago

Thank you for your encouraging feedback! We're really glad the perspective on many-shot CoT-ICL as in-context test-time learning was interesting to you.

To answer your question: yes, we will open-source the code and experimental scripts by next week. I'll add the GitHub link right here once it's ready. Really appreciate your interest in building on our work.