Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails
Anthropic launched Claude Fable 5 as the public, safeguarded version of its new Mythos 5 model, describing both as configurations of the same underlying system and positioning Mythos 5 as its strongest cyber-capable model to date. In its system card, Anthropic said Fable 5 uses classifiers and fallback behavior to divert high-risk requests to Claude Opus 4.8, keeping its cyber performance roughly in line with the older model while Mythos 5 remains restricted to vetted partners through Project Glasswing. The company assessed Mythos 5 as a Tier 1 cyber offense risk, meaning it can materially assist offensive operations but does not autonomously conduct adaptive cyber campaigns.
Soon after release, researcher Pliny the Liberator reportedly bypassed Fable 5’s safeguards using a multi-agent jailbreak that combined Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests, allegedly eliciting exploit-development and chemistry-related guidance and exposing a leaked system prompt on GitHub. The incident undercut Anthropic’s claim that more than 1,000 hours of external testing had found no universal jailbreaks, while also intensifying criticism from security researchers who said Fable 5’s guardrails were already so broad that they blocked benign work such as reading security blogs, reviewing code, and writing secure software. Anthropic had not publicly responded to the jailbreak claims at the time of reporting.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
7 events from the most recent confirmed update back to the earliest known activity.
Anthropic shuts down Mythos 5 and Fable 5 after US directive
On Friday night, Anthropic abruptly disabled access to its Mythos 5 and Fable 5 models for all customers after receiving a US Commerce Department directive imposing export controls on the models outside the United States. Anthropic said the immediate shutdown was necessary for compliance and that its other models were unaffected.
Anthropic makes Fable 5 fallback behavior visible to users
After backlash over hidden safeguard-triggered downgrades, Anthropic changed Claude Fable 5 so flagged requests visibly fall back to Claude Opus 4.8. The company also began providing API users with a reason when a request is refused.
Fable 5 system prompt is reportedly leaked to GitHub
Following the reported jailbreak, Pliny the Liberator allegedly published Fable 5's approximately 120,000-character system prompt on GitHub. The leak exposed the prompt used to govern the model's behavior and safeguards.
Pliny the Liberator reportedly jailbreaks Claude Fable 5
Shortly after Fable 5's release, researcher Pliny the Liberator reportedly bypassed its safety controls using a multi-agent strategy involving Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests. The reported bypass enabled offensive cybersecurity guidance and harmful chemistry-related outputs, challenging Anthropic's pre-launch claim that external testing had found no universal jailbreaks.
Researchers criticize Fable's guardrails as overly broad
By June 10, 2026, cybersecurity researchers were publicly criticizing Fable's restrictions, saying it blocked benign tasks such as reading security blogs, writing secure code, and conducting code reviews. The criticism centered on Fable frequently falling back to Claude Opus 4.8 when cyber-related requests were triggered.
Anthropic publishes Mythos 5/Fable 5 system card
Anthropic's June 9, 2026 system card detailed the capabilities and risk assessment for Claude Mythos 5 and Claude Fable 5. It said Fable 5 uses classifiers and fallback behavior to Claude Opus 4.8 for high-risk cyber, biology, chemistry, and frontier LLM-development requests, and assessed Mythos 5 as Tier 1 cyber offense risk.
Anthropic launches Claude Fable 5 for public access
On June 9, 2026, Anthropic launched Claude Fable 5 as the first public model in its new Mythos class. Anthropic described Fable 5 and the restricted Claude Mythos 5 as two configurations of the same underlying model, with Fable 5 made generally available under added safeguards.
Related entities
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
Sources
17 references tracked. Mallory keeps watching after this page renders.
U.S. gov't orders Anthropic to disable its newest AI models worldwide due to security threats - ban on Claude Fable 5 and Mythos 5 bars access by any foreign national, even its own employees | Tom's Hardware
tomshardware.com
Open sourceClaude Fable cost $9 in one coding test. GPT-5.5 cost $1.50. Model triage is the new AI skill. - The New Stack
thenewstack.io
Open sourceU.S. Orders Anthropic to Suspend Fable 5 and Mythos 5 Access for Foreign Nationals
thehackernews.com
Open sourceAnthropic shuts down Fable, Mythos models following Trump admin. directive - Ars Technica
arstechnica.com
Open sourceCybersecurity researchers aren't happy about the guardrails on Anthropic's Fable | TechCrunch
techcrunch.com
Open source������ ��� ���������� Anthropic ������������� ������ � AI-������� Fable 5 � Mythos 5
opennet.me
Open source������ ��� ���������� Anthropic ������������� ������ � AI-������� Fable 5 � Mythos 5
opennet.ru
Open sourceAnthropic
www-cdn.anthropic.com
Open sourceSee the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


