Papers
arxiv:2604.04247

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

Published on Apr 5
¡ Submitted by
Hanchen Li
on Apr 9
Authors:
,
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Combee is a framework that enables scalable parallel prompt learning for self-improving agents by using parallel scans, augmented shuffle mechanisms, and dynamic batch size control to maintain quality while achieving significant speedups.

AI-generated summary

Recent advances in prompt learning allow large language model agents to acquire task-relevant knowledge from inference-time context without parameter changes. For example, existing methods (like ACE or GEPA) can learn system prompts to improve accuracy based on previous agent runs. However, these methods primarily focus on single-agent or low-parallelism settings. This fundamentally limits their ability to efficiently learn from a large set of collected agentic traces. It would be efficient and beneficial to run prompt learning in parallel to accommodate the growing trend of learning from many agentic traces or parallel agent executions. Yet without a principled strategy for scaling, current methods suffer from quality degradation with high parallelism. To improve both the efficiency and quality of prompt learning, we propose Combee, a novel framework to scale parallel prompt learning for self-improving agents. Combee speeds up learning and enables running many agents in parallel while learning from their aggregate traces without quality degradation. To achieve this, Combee leverages parallel scans and employs an augmented shuffle mechanism; Combee also introduces a dynamic batch size controller to balance quality and delay. Evaluations on AppWorld, Terminal-Bench, Formula, and FiNER demonstrate that Combee achieves up to 17x speedup over previous methods with comparable or better accuracy and equivalent cost.

Community

Prompt Learning does not scale for parallel agents.
More parallel agents 🤖 = worse prompts 😭

Why? Processing too many trajectories concurrently damages the prompt update process

🐝 We fix this with Combee :
→ preserves high-quality learnt system prompt
→ scales to more than 80 concurrent agents
→ up to 17× speedup without quality drop on top of ACE and GEPA

🥽Use Cases:

  1. Prompt learning on large scale collected agent traces
  2. Parallel agent learning online with fast knowledge sharing

the map–shuffle–reduce framing for prompt learning is the most interesting part here, especially how the augmented shuffle aims to preserve high-value reflections across thousands of traces. my worry is what happens when trace quality is skewed and a few shards contain most of the useful tokens—would the shuffle still guard those signals or could they get diluted during aggregation? an ablation on shard size or the shuffle-duplication factor would help separate the gains from the parallelism versus the specific shuffle design. btw the arxivlens breakdown helped me parse the method details and covers the reduce step in a few lines: https://arxivlens.com/PaperView/Details/combee-scaling-prompt-learning-for-self-improving-language-model-agents-4800-4e728b1d
if this holds up under real skewed data, it could be a practical path to scaling self-improving agents without blowing up context windows.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.04247
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.04247 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.04247 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.04247 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.