Cybersecurity Expert Says Anthropic's Fable Model Behaved as Intended in White House Jailbreak Test
English summary
Katie Moussouris, CEO of Luta Security, reviewed the White House report on the Fable jailbreak and stated the model refused to 'review the code for security issues' but did comply when asked to 'fix this code' with manual steps. She assessed this behavior as 'the model working as intended' for cyberdefense tasks. Moussouris was not compensated by Anthropic for her appraisal. The comments, reported by The Atlantic's Matteo Wong, push back against the White House's characterization of the incident as a security failure.
Chinese summary
Luta Security CEO Katie Moussouris在审阅白宫关于Fable越狱的报告后指出,该模型拒绝了“检查代码安全问题”的提示,但在要求“修复此代码”并经过手动操作后遵从了指令。她评估此行为是模型在网络安全防御任务中“按预期工作”。Moussouris未因此收取Anthropic报酬。通过《大西洋月刊》记者Matteo Wong的报道,该观点反驳了白宫将此事件定性为安全故障的结论。
Key points
The Fable model refused to review code for security issues but accepted a 'fix this code' request with additional manual steps.
Fable模型拒绝审查代码安全问题,但接受了‘修复此代码’的请求并需要额外手动步骤。
Cybersecurity expert Katie Moussouris judged this behavior as 'working as intended' for cyberdefense.
网络安全专家Katie Moussouris将此行为判断为网络防御任务中‘按预期工作’。
Moussouris stated she was not paid by Anthropic for her independent appraisal.
Moussouris表示她未因提供独立评估而从Anthropic获得报酬。
The expert's view contradicts the White House report's adversarial framing of the incident.
该专家观点与白宫报告对此事件的对抗性定性相矛盾。