arxiv:2604.13346

AgentSPEX: An Agent SPecification and EXecution Language

Published on Apr 14

· Submitted by

Rui Pan on Apr 22

#1 Paper of the day

UIUC ScaleML Lab

Upvote

Authors:

Abstract

AgentSPEX is a domain-specific language and framework for creating structured, modular, and interpretable large language model agent workflows with explicit control flow and state management.

AI-generated summary

Language-model agent systems commonly rely on reactive prompting, in which a single instruction guides the model through an open-ended sequence of reasoning and tool-use steps, leaving control flow and intermediate state implicit and making agent behavior potentially difficult to control. Orchestration frameworks such as LangGraph, DSPy, and CrewAI impose greater structure through explicit workflow definitions, but tightly couple workflow logic with Python, making agents difficult to maintain and modify. In this paper, we introduce AgentSPEX, an Agent SPecification and EXecution Language for specifying LLM-agent workflows with explicit control flow and modular structure, along with a customizable agent harness. AgentSPEX supports typed steps, branching and loops, parallel execution, reusable submodules, and explicit state management, and these workflows execute within an agent harness that provides tool access, a sandboxed virtual environment, and support for checkpointing, verification, and logging. Furthermore, we provide a visual editor with synchronized graph and workflow views for authoring and inspection. We include ready-to-use agents for deep research and scientific research, and we evaluate AgentSPEX on 7 benchmarks. Finally, we show through a user study that AgentSPEX provides a more interpretable and accessible workflow-authoring paradigm than a popular existing agent framework.

View arXiv page View PDF Project page GitHub 11 Add to collection

Community

research4pan

Paper submitter about 5 hours ago

Right now, many agent workflows fall into two categories, either 1) they are primarily built with Python code—flexible, but increasingly hard to read, modify, and reproduce, or 2) they rely heavily on natural language (e.g., Markdown-based “skills”)—lightweight, but with less stable execution paths and weaker controllability.

So we built AgentSPEX: a system that SPecifies agent workflows as declarative YAML, EXecuted inside an isolated sandbox.

✅ Core capabilities of AgentSPEX

1️⃣ YAML-based workflows

Supports steps such as task / for_each / while / if / parallel / call.
Compared to code-centric workflows, it is easier to quickly understand the full pipeline;
compared to purely natural-language workflows, the control flow is much more explicit.

2️⃣ Sandbox execution + tool integration

Capabilities like browser access, terminal, code execution, and file systems all run in an isolated environment.
This makes it safer by default and more practical for real-world tasks.

3️⃣ Recoverable, replayable, and verifiable

The AgentSPEX harness supports Checkpoint / Resume / Trace Replay.
If a task is interrupted, it can resume from where it left off; if you want to reproduce a process, you can directly replay it.
We are also translating key execution properties into Lean4 propositions, using formal verification to provide correctness guarantees for agents.

Additionally, we built a visual YAML editor, where the workflow diagram and YAML stay synchronized in both directions—making it much more intuitive to modify workflows.

The AgentSPEX website already includes several ready-to-use agents:

📚 Deep Research: ~15 minutes, <$1 for in-depth research on any topic
🧪 AI Scientists: ~25 minutes, <$3 to generate a research proposal
📝 AI Advisor: ~80% coverage of human reviewer feedback