Papers
arxiv:2604.13346

AgentSPEX: An Agent SPecification and EXecution Language

Published on Apr 14
· Submitted by
Rui Pan
on Apr 22
#1 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,

Abstract

AgentSPEX is a domain-specific language and framework for creating structured, modular, and interpretable large language model agent workflows with explicit control flow and state management.

AI-generated summary

Language-model agent systems commonly rely on reactive prompting, in which a single instruction guides the model through an open-ended sequence of reasoning and tool-use steps, leaving control flow and intermediate state implicit and making agent behavior potentially difficult to control. Orchestration frameworks such as LangGraph, DSPy, and CrewAI impose greater structure through explicit workflow definitions, but tightly couple workflow logic with Python, making agents difficult to maintain and modify. In this paper, we introduce AgentSPEX, an Agent SPecification and EXecution Language for specifying LLM-agent workflows with explicit control flow and modular structure, along with a customizable agent harness. AgentSPEX supports typed steps, branching and loops, parallel execution, reusable submodules, and explicit state management, and these workflows execute within an agent harness that provides tool access, a sandboxed virtual environment, and support for checkpointing, verification, and logging. Furthermore, we provide a visual editor with synchronized graph and workflow views for authoring and inspection. We include ready-to-use agents for deep research and scientific research, and we evaluate AgentSPEX on 7 benchmarks. Finally, we show through a user study that AgentSPEX provides a more interpretable and accessible workflow-authoring paradigm than a popular existing agent framework.

Community

Right now, many agent workflows fall into two categories, either 1) they are primarily built with Python code—flexible, but increasingly hard to read, modify, and reproduce, or 2) they rely heavily on natural language (e.g., Markdown-based “skills”)—lightweight, but with less stable execution paths and weaker controllability.

So we built AgentSPEX: a system that SPecifies agent workflows as declarative YAML, EXecuted inside an isolated sandbox.

✅ Core capabilities of AgentSPEX

1️⃣ YAML-based workflows

  • Supports steps such as task / for_each / while / if / parallel / call.
  • Compared to code-centric workflows, it is easier to quickly understand the full pipeline;
  • compared to purely natural-language workflows, the control flow is much more explicit.

2️⃣ Sandbox execution + tool integration

  • Capabilities like browser access, terminal, code execution, and file systems all run in an isolated environment.
  • This makes it safer by default and more practical for real-world tasks.

3️⃣ Recoverable, replayable, and verifiable

  • The AgentSPEX harness supports Checkpoint / Resume / Trace Replay.
  • If a task is interrupted, it can resume from where it left off; if you want to reproduce a process, you can directly replay it.
  • We are also translating key execution properties into Lean4 propositions, using formal verification to provide correctness guarantees for agents.

Additionally, we built a visual YAML editor, where the workflow diagram and YAML stay synchronized in both directions—making it much more intuitive to modify workflows.

The AgentSPEX website already includes several ready-to-use agents:

📚 Deep Research: ~15 minutes, <$1 for in-depth research on any topic
🧪 AI Scientists: ~25 minutes, <$3 to generate a research proposal
📝 AI Advisor: ~80% coverage of human reviewer feedback

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.13346
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.13346 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.13346 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.13346 in a Space README.md to link it from this page.

Collections including this paper 1