Recursive Agent Harnesses
English summary
The paper introduces and formalizes the Recursive Agent Harness (RAH), a code-first extension of recursive language models where a parent agent generates executable scripts that spawn full subagent harnesses with filesystem tools, code execution, and planning. Controlled evaluation on Oolong-Synthetic (199 samples, context lengths up to 4M tokens) shows RAH with a fixed GPT-5 backbone improves the Codex coding-agent baseline from 71.75% to 81.36%. With a stronger backbone, Claude Sonnet 4.5, RAH achieves 89.77%, confirming the gains stem from the harness design rather than model scaling.
Chinese summary
本文提出并形式化了递归智能体框架(RAH),这是递归语言模型的一种以代码为中心的扩展:父智能体生成可执行脚本,并行生成带有文件系统工具、代码执行和规划的完整子智能体套件。在Oolong-Synthetic(199个样本,上下文长度达4M tokens)上的受控评估显示,固定GPT-5骨干下,RAH将Codex编码智能体基线从71.75%提升至81.36%;使用更强骨干Claude Sonnet 4.5时,RAH达到89.77%,表明提升源于框架设计而非模型规模。
Key points
RAH defines harness recursion: parent agent spawns subagent harnesses with tools via executable scripts, extending model recursion into code-level orchestration.
RAH定义套件递归:父智能体通过可执行脚本生成带工具的完整子智能体套件,将模型递归扩展为代码级编排。
Evaluated on Oolong-Synthetic (199 samples, 13 context-length buckets up to 4M tokens), RAH with GPT-5 backbone achieves 81.36%, a 9.61 percentage point improvement over the Codex coding-agent baseline (71.75%).
在Oolong-Synthetic(199个样本,13个上下文长度区间,最高4M tokens)上,固定GPT-5骨干的RAH达到81.36%,较Codex基线(71.75%)提升9.61个百分点。
With a stronger backbone Claude Sonnet 4.5, RAH reaches 89.77%, demonstrating that gains are attributable to the harness design rather than model power alone.
使用更强骨干Claude Sonnet 4.5时,RAH达到89.77%,表明提升来自框架设计而非单纯依赖模型能力。