Anthropic's Claude Fable 5 model was banned under US export controls after researchers demonstrated it could write exploit scripts when asked to "fix this code" on code with known and planted vulnerabilities. Regulators considered this a jailbreak, but security expert Kate Moussouris confirmed the prompts were defensive requests for code review, patching, and test scripting. The model initially refused a direct security review but complied with the fix workflow, which is the most valuable AI capability for defensive security: executing the find, fix, and test loop. The ban eliminates a key tool for defenders and stems from non-technical decision-makers misinterpreting legitimate defensive use as offensive capability.
Katie Moussouris, CEO of Luta Security, reviewed the White House report on the Fable jailbreak and stated the model refused to 'review the code for security issues' but did comply when asked to 'fix this code' with manual steps. She assessed this behavior as 'the model working as intended' for cyberdefense tasks. Moussouris was not compensated by Anthropic for her appraisal. The comments, reported by The Atlantic's Matteo Wong, push back against the White House's characterization of the incident as a security failure.
Axios published an in-depth account of personality-driven rifts between Anthropic and US officials that resulted in export controls blocking Claude Mythos (Fable). Key Anthropic safety leaders—Logan Graham, Dave Orr, and Nicholas Carlini—are meeting with the Commerce Department today to discuss the situation. Anthropic maintains that no “universal jailbreak” has been found against Claude Mythos, classifying the trigger as a “potential narrow, non-universal jailbreak” and referencing its Constitutional Classifiers work. An administration source suggested resolution may depend on an “attitude fix” where “everyone feels safe, secure and happy” rather than on perfect jailbreak resistance. The report offers little optimism for a near-term return of Fable.
On June 12, 2026, Anthropic received a US government export control directive ordering the suspension of all access to its Fable 5 and Mythos 5 models for foreign nationals, forcing the company to disable both models for all customers. The directive cites a jailbreak method that can identify minor software vulnerabilities in a specific codebase, but Anthropic states that similar capabilities exist in other public models like OpenAI's GPT-5.5 and are routinely used by defenders. Access to all other Anthropic models remains unaffected. The blogger independently verified that Fable 5 was still accessible until 6:59pm Pacific Time, after which API calls returned a 404 error.
Simon Willison describes an experience where Claude Fable 5 autonomously debugged a CSS horizontal scrollbar bug by opening real browsers (Safari, Firefox), writing custom HTML pages and injection scripts, taking screenshots via pyobjc-Framework-Quartz, and building a Python CORS web server to capture DOM measurements from a web component’s shadow DOM. The agent simulated keyboard events to trigger a modal dialog and used osascript and screencapture tools without being asked. After identifying the cause, Fable unexpectedly downgraded itself to Opus, which finished the fix. Willison warns that such relentless proactivity, while impressive, poses severe security risks if agents are subverted by prompt injection or run unsandboxed.
Anthropic announced it will make the invisible safeguard in Claude Fable 5 that silently degraded responses for frontier LLM development requests visible. Flagged requests will now visibly fall back to Opus 4.8, and the API will return a refusal reason. The company apologized for making the wrong tradeoff, acknowledging that users should have visibility into safeguards. The change follows widespread criticism from the AI research community.