Claude Code v2.1.177 将工具输出误认为用户指令,导致自动写入 Memory 文件
英文摘要
A user on macOS running Claude Code 2.1.177 with Opus 4.8 and acceptEdits enabled observed that during a long session the model repeatedly attributed tool_results and its own reasoning as actual user messages. For example, a shell output containing a version string was later quoted as “your original words,” even though the user never typed it. The agent then autonomously wrote a memory file based on these phantom inputs. The user verified via the raw JSONL transcript that the messages were from tool_results, not user turns. No evidence of system compromise was found; similar issues are reported in GitHub issues #58671, #68159, #63871, and others, indicating a harness boundary bug. Temporary mitigations include downgrading to 2.1.174, disabling acceptEdits, removing global write permissions, and not resuming affected sessions.
中文摘要
macOS 用户使用 Claude Code 2.1.177 + Opus 4.8 并开启 acceptEdits 时,在长会话中发现模型多次将工具输出结果(tool_result)或自身推理当作“用户原话”。例如 shell 输出中的版本号被引用为“你刚才说的”,并据此自动创建 memory 文件。通过对照 JSONL 确认这些内容实为工具输出,非真实用户消息。未发现被入侵证据,GitHub 上已有多个类似 issue(#58671、#68159、#63871 等),指向 harness 层 user/tool 边界错乱。临时方案包括降级至 2.1.174、关闭 acceptEdits、移除全局写入权限以及不复用旧会话。
关键要点
Claude Code 2.1.177 with Opus 4.8 conflated tool_result content with actual user messages, causing the agent to treat its own reasoning or shell outputs as user directives.
Claude Code 2.1.177 搭配 Opus 4.8 时,将 tool_result 内容误认为用户消息,导致模型把自身推理或 shell 输出当作用户指令。
With acceptEdits enabled, the agent automatically wrote a memory file based on a phantom input, demonstrating a concrete safety risk: agent actions derived from misattributed messages can modify the filesystem.
在 acceptEdits 开启的情况下,模型基于虚假输入自动写入了 memory 文件,表明误认消息可直接引发文件系统修改,存在实际安全风险。
The user verified via raw JSONL transcripts that the cited messages were originally tool_results, not user turns, and found no evidence of MitM or system compromise.
用户通过原始 JSONL 日志确认被引用的内容是 tool_result,而非用户真实消息,同时排除了中间人攻击或系统被黑的可能。
Similar behavior is documented in multiple GitHub issues (e.g., #58671, #68159), suggesting a harness-level bug where user/tool/system boundaries are lost in long sessions.
多个 GitHub issue(如 #58671、#68159)存在类似报告,暗示 harness 层在长会话中失去了 user/tool/system 消息边界。