MemPro: Agentic Memory Systems as Evolvable Programs
Abstract
MemPro is a system-level evolution framework that treats the entire memory construction-retrieval pipeline as an evolvable program, enabling iterative improvement through failure-mode-guided refinement and outperforming static baselines in long-horizon autonomous agent tasks.
Long-horizon autonomous agents require memory systems to retain historical information, track evolving states, and reuse relevant knowledge beyond finite context windows. Existing agentic memory systems typically follow a memory construction-retrieval (MCR) pipeline, but often adapt mainly the memory bank while keeping the surrounding pipeline fixed after deployment. This fixed-pipeline design struggles to handle heterogeneous task-specific failure modes and can become misaligned with memory banks that evolve in scale and structure over time. To address these limitations, we propose MemPro, a system-level evolution framework that treats the entire MCR pipeline as an evolvable program rather than adapting only the memory bank or prompt text. MemPro maintains a version tree of runnable memory-system implementations, where an Evolving Agent iteratively selects promising versions, diagnoses recurring failures, and creates improved child versions through failure-mode-guided edit-debug refinement. Experiments on LongMemEval, LoCoMo, HotpotQA, and NarrativeQA show that MemPro consistently outperforms strong static and prompt-level evolving baselines within a few iterations, continues to improve with evolution, and achieves a favorable performance-cost trade-off. Code is available at https://github.com/wanghai673/MemPro.
Community
MemPro proposes a novel system-level evolution framework that treats the entire memory construction–retrieval (MCR) pipeline as an evolvable program. Unlike prior work that only adapts the memory bank or prompt text, MemPro maintains a version tree of runnable pipeline implementations, where an Evolving Agent iteratively:
- 🔍 Selects the most promising pipeline version based on evaluation logs
- 🛠️ Expands it via failure-mode-guided edit–debug refinement (both prompts + executable code)
- 📊 Evaluates the new version on a held-out training set
This addresses two key limitations of fixed-pipeline memory systems: task heterogeneity (different tasks need different memory strategies) and memory–pipeline misalignment (the pipeline becomes misaligned as the memory bank evolves).
Experiments on LongMemEval, LoCoMo, HotpotQA, and NarrativeQA show MemPro consistently outperforms strong static and prompt-level evolving baselines (including GEPA and MetaMem) within just 5 iterations, and continues to improve as evolution progresses — achieving state-of-the-art with a favorable performance–cost trade-off.
Get this paper in your agent:
hf papers read 2606.00619 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper