Skip to main content
Mallory
Back to intelligence
ai-platform-securityoffensive-tooling-releasepayload-delivery-evasion

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails

Updated 2h agoFirst seen Jun 10, 202617 sources

Anthropic launched Claude Fable 5 as the public, safeguarded version of its new Mythos 5 model, describing both as configurations of the same underlying system and positioning Mythos 5 as its strongest cyber-capable model to date. In its system card, Anthropic said Fable 5 uses classifiers and fallback behavior to divert high-risk requests to Claude Opus 4.8, keeping its cyber performance roughly in line with the older model while Mythos 5 remains restricted to vetted partners through Project Glasswing. The company assessed Mythos 5 as a Tier 1 cyber offense risk, meaning it can materially assist offensive operations but does not autonomously conduct adaptive cyber campaigns.

Soon after release, researcher Pliny the Liberator reportedly bypassed Fable 5’s safeguards using a multi-agent jailbreak that combined Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests, allegedly eliciting exploit-development and chemistry-related guidance and exposing a leaked system prompt on GitHub. The incident undercut Anthropic’s claim that more than 1,000 hours of external testing had found no universal jailbreaks, while also intensifying criticism from security researchers who said Fable 5’s guardrails were already so broad that they blocked benign work such as reading security blogs, reviewing code, and writing secure software. Anthropic had not publicly responded to the jailbreak claims at the time of reporting.

Share:
Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

7 events from the most recent confirmed update back to the earliest known activity.

7 EVENTS
Jun 12, 20262d ago

Anthropic shuts down Mythos 5 and Fable 5 after US directive

On Friday night, Anthropic abruptly disabled access to its Mythos 5 and Fable 5 models for all customers after receiving a US Commerce Department directive imposing export controls on the models outside the United States. Anthropic said the immediate shutdown was necessary for compliance and that its other models were unaffected.

Anthropic shuts down Fable, Mythos models following Trump admin. directive - Ars Technica

Anthropic makes Fable 5 fallback behavior visible to users

After backlash over hidden safeguard-triggered downgrades, Anthropic changed Claude Fable 5 so flagged requests visibly fall back to Claude Opus 4.8. The company also began providing API users with a reason when a request is refused.

Claude Fable 5 secretly throttled AI researchers, and the internet went wild | ZDNET
Jun 11, 20263d ago

Fable 5 system prompt is reportedly leaked to GitHub

Following the reported jailbreak, Pliny the Liberator allegedly published Fable 5's approximately 120,000-character system prompt on GitHub. The leak exposed the prompt used to govern the model's behavior and safeguards.

Anthropic's Claude Fable 5 Jailbroken to Generate Stack Exploits - Cyber Security News

Pliny the Liberator reportedly jailbreaks Claude Fable 5

Shortly after Fable 5's release, researcher Pliny the Liberator reportedly bypassed its safety controls using a multi-agent strategy involving Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests. The reported bypass enabled offensive cybersecurity guidance and harmful chemistry-related outputs, challenging Anthropic's pre-launch claim that external testing had found no universal jailbreaks.

Anthropic's Claude Fable 5 Jailbroken to Generate Stack Exploits - Cyber Security News
Jun 10, 20264d ago

Researchers criticize Fable's guardrails as overly broad

By June 10, 2026, cybersecurity researchers were publicly criticizing Fable's restrictions, saying it blocked benign tasks such as reading security blogs, writing secure code, and conducting code reviews. The criticism centered on Fable frequently falling back to Claude Opus 4.8 when cyber-related requests were triggered.

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable | TechCrunch
Jun 9, 20265d ago

Anthropic publishes Mythos 5/Fable 5 system card

Anthropic's June 9, 2026 system card detailed the capabilities and risk assessment for Claude Mythos 5 and Claude Fable 5. It said Fable 5 uses classifiers and fallback behavior to Claude Opus 4.8 for high-risk cyber, biology, chemistry, and frontier LLM-development requests, and assessed Mythos 5 as Tier 1 cyber offense risk.

Anthropic

Anthropic launches Claude Fable 5 for public access

On June 9, 2026, Anthropic launched Claude Fable 5 as the first public model in its new Mythos class. Anthropic described Fable 5 and the restricted Claude Mythos 5 as two configurations of the same underlying model, with Fable 5 made generally available under added safeguards.

Anthropic's Claude Fable 5 Jailbroken to Generate Stack Exploits - Cyber Security News
LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

62 LINKEDOpen in app
Affected products
9 linked
DeepseekClaude CodeWindows ServerFirefoxChatgptGoogle DriveChromeCopilotClaude
Organizations
53 linked
AnthropicOpenaiAxiosZhipuAlibaba CloudHugging FaceTom's HardwareDeepseekMoonshot AIMicrosoft CorporationGoogleMozillaLinkedinXGitHubInsight Media GroupInstructureReplitThe Wall Street JournalNvidiaKubernetesXBOWTechCrunchInternational Business MachinesCato NetworksAdvanced Micro DevicesDark ReadingBritish Broadcasting CorporationIvantiHarnessWIREDServicenowAppleDeloitteReutersCursorZDNETIntelFortuneExabeamNetlifyMiggoZero NetworksGetty ImagesUber TechnologiesGray SwanZ.aiAndon LabsTolmoMeridian LabsDyno TherapeuticsSilicon DataCitadel Securities
SOURCE COVERAGE

Sources

17 references tracked. Mallory keeps watching after this page renders.

17 SOURCESView all
The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails | Mallory