HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents
English summary
Current tool-augmented LLM agents suffer from an execution-granularity mismatch, as step-wise atomic tool calls expose low-level dataflow and waste context windows. HyperTool proposes a unified MCP-style tool interface where the agent invokes a code block that internally calls multiple tools, manipulates returned values, and passes intermediate results locally, collapsing deterministic subroutines into a single model-visible call. The system is trained on synthesized trajectories from cross-tool compositional tasks and verified in real MCP environments. On the MCP-Universe benchmark, HyperTool raises average accuracy from 15.69% to 35.29% on Qwen3-32B and from 9.93% to 33.33% on Qwen3-8B, outperforming GPT-OSS and Kimi-k2.5. The results show that moving beyond step-wise tool calls significantly improves multi-step tool use in agents.
Chinese summary
现有工具增强的大模型代理使用逐步原子工具调用,导致执行粒度不匹配,暴露低层数据流并浪费上下文。HyperTool提出统一MCP风格工具接口,代理以代码块形式调用,可在内部调用多个工具、处理返回值并本地传递中间结果,将确定性子程序折叠为一次外部调用。系统通过跨工具组合任务合成轨迹并在真实MCP环境中验证进行训练。在MCP-Universe基准上,HyperTool将Qwen3-32B的平均准确率从15.69%提升至35.29%,Qwen3-8B从9.93%提升至33.33%,并超过GPT-OSS和Kimi-k2.5。这表明改变工具执行粒度能大幅提升多步工具使用能力。
Key points
Identifies an execution-granularity mismatch in step-wise tool-calling agents, where deterministic tool workflows are unnecessarily exposed as repeated model decisions.
指出逐步工具调用代理存在执行粒度不匹配,确定性工具工作流被不必要地暴露为重复模型决策。
Introduces HyperTool, a code-block-based tool interface that folds multiple tool calls, value manipulation, and intermediate passing into a single model-visible invocation.
提出HyperTool,一个基于代码块的统一工具接口,将多次工具调用、值处理和中间传递折叠为单一模型可观察调用。
Training data is synthesized from cross-tool compositional tasks and verified in real MCP environments to teach models the HyperTool format.
从跨工具组合任务合成训练轨迹并在真实MCP环境中验证,以教会模型使用HyperTool格式。
On MCP-Universe, HyperTool boosts Qwen3-32B accuracy from 15.69% to 35.29% and Qwen3-8B from 9.93% to 33.33%, surpassing strong baselines.
在MCP-Universe上,HyperTool将Qwen3-32B准确率从15.69%提升至35.29%,Qwen3-8B从9.93%提升至33.33%,超越多个强基线。