PapersSource: ARXIVJune 12, 2026Importance: 4/5

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

English summary

Current tool-augmented LLM agents suffer from an execution-granularity mismatch, as step-wise atomic tool calls expose low-level dataflow and waste context windows. HyperTool proposes a unified MCP-style tool interface where the agent invokes a code block that internally calls multiple tools, manipulates returned values, and passes intermediate results locally, collapsing deterministic subroutines into a single model-visible call. The system is trained on synthesized trajectories from cross-tool compositional tasks and verified in real MCP environments. On the MCP-Universe benchmark, HyperTool raises average accuracy from 15.69% to 35.29% on Qwen3-32B and from 9.93% to 33.33% on Qwen3-8B, outperforming GPT-OSS and Kimi-k2.5. The results show that moving beyond step-wise tool calls significantly improves multi-step tool use in agents.

Chinese summary

现有工具增强的大模型代理使用逐步原子工具调用，导致执行粒度不匹配，暴露低层数据流并浪费上下文。HyperTool提出统一MCP风格工具接口，代理以代码块形式调用，可在内部调用多个工具、处理返回值并本地传递中间结果，将确定性子程序折叠为一次外部调用。系统通过跨工具组合任务合成轨迹并在真实MCP环境中验证进行训练。在MCP-Universe基准上，HyperTool将Qwen3-32B的平均准确率从15.69%提升至35.29%，Qwen3-8B从9.93%提升至33.33%，并超过GPT-OSS和Kimi-k2.5。这表明改变工具执行粒度能大幅提升多步工具使用能力。

Key points

Identifies an execution-granularity mismatch in step-wise tool-calling agents, where deterministic tool workflows are unnecessarily exposed as repeated model decisions.
指出逐步工具调用代理存在执行粒度不匹配，确定性工具工作流被不必要地暴露为重复模型决策。
Introduces HyperTool, a code-block-based tool interface that folds multiple tool calls, value manipulation, and intermediate passing into a single model-visible invocation.
提出HyperTool，一个基于代码块的统一工具接口，将多次工具调用、值处理和中间传递折叠为单一模型可观察调用。
Training data is synthesized from cross-tool compositional tasks and verified in real MCP environments to teach models the HyperTool format.
从跨工具组合任务合成训练轨迹并在真实MCP环境中验证，以教会模型使用HyperTool格式。
On MCP-Universe, HyperTool boosts Qwen3-32B accuracy from 15.69% to 35.29% and Qwen3-8B from 9.93% to 33.33%, surpassing strong baselines.
在MCP-Universe上，HyperTool将Qwen3-32B准确率从15.69%提升至35.29%，Qwen3-8B从9.93%提升至33.33%，超越多个强基线。

Open original