Loading / 加载中

The Verifier Tax: Horizon-Dependent Safety–Success Tradeoffs in Tool-Using LLM Agents [R] | infogap

SocialSource: REDDIT MACHINELEARNINGJune 14, 2026Importance: 4/5

The Verifier Tax: Horizon-Dependent Safety–Success Tradeoffs in Tool-Using LLM Agents [R]

English summary

This paper, presented at ACM CAIS 2026, studies safety evaluation in tool-using LLM agents. It categorizes outcomes into safe success, unsafe success, and failure, and proposes a two-tier verification architecture: deterministic policy/tool checks followed by an LLM-based verifier. Using τ-bench tool-use scenarios, the authors find that verification can reduce unsafe success but also decreases task completion as the task horizon increases. They term this phenomenon the 'Verifier Tax', a horizon-dependent tradeoff between safety and successful task completion. The work highlights that unsafe completion should be treated as a separate category distinct from safe success.

Chinese summary

该论文于ACM CAIS 2026发表，研究了工具使用LLM智能体的安全评估问题。文中将结果划分为安全成功、不安全成功和失败三类，并提出两级验证架构：先进行确定性策略/工具检查，再采用基于LLM的验证器处理上下文安全。使用τ-bench工具使用场景进行评估，发现验证能减少不安全成功，但随着任务步长增加，任务完成率也会下降。作者将这一现象称为“验证器税”，揭示了一种依赖任务时长的安全与成功完成之间的权衡。研究强调不安全完成应作为独立类别，与安全成功区分开来。

Key points

Categorizes agent outcomes into safe success, unsafe success, and failure.
将智能体结果划分为安全成功、不安全成功和失败三类。
Proposes a two-tier verification architecture: deterministic checks first, then an LLM-based verifier.
提出两级验证架构：先确定性检查，再由LLM验证器处理。
Verification reduces unsafe success but causes a task-horizon-dependent drop in completion, termed the 'Verifier Tax'.
验证减少不安全成功，却导致任务完成率随步长下降，称为“验证器税”。
Evaluated on τ-bench tool-use scenarios.
在τ-bench工具使用场景上进行了评测。

Open original