ai-platform-securityoffensive-tooling-releasepayload-delivery-evasion

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails

Updated 2h agoFirst seen Jun 10, 202617 sources

Anthropic launched Claude Fable 5 as the public, safeguarded version of its new Mythos 5 model, describing both as configurations of the same underlying system and positioning Mythos 5 as its strongest cyber-capable model to date. In its system card, Anthropic said Fable 5 uses classifiers and fallback behavior to divert high-risk requests to Claude Opus 4.8, keeping its cyber performance roughly in line with the older model while Mythos 5 remains restricted to vetted partners through Project Glasswing. The company assessed Mythos 5 as a Tier 1 cyber offense risk, meaning it can materially assist offensive operations but does not autonomously conduct adaptive cyber campaigns.

Soon after release, researcher Pliny the Liberator reportedly bypassed Fable 5’s safeguards using a multi-agent jailbreak that combined Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests, allegedly eliciting exploit-development and chemistry-related guidance and exposing a leaked system prompt on GitHub. The incident undercut Anthropic’s claim that more than 1,000 hours of external testing had found no universal jailbreaks, while also intensifying criticism from security researchers who said Fable 5’s guardrails were already so broad that they blocked benign work such as reading security blogs, reviewing code, and writing secure software. Anthropic had not publicly responded to the jailbreak claims at the time of reporting.

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails

Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

Start free trial

EVENT TIMELINE

How this story unfolded

7 events from the most recent confirmed update back to the earliest known activity.

7 EVENTS

Jun 12, 20262d ago

Anthropic shuts down Mythos 5 and Fable 5 after US directive

On Friday night, Anthropic abruptly disabled access to its Mythos 5 and Fable 5 models for all customers after receiving a US Commerce Department directive imposing export controls on the models outside the United States. Anthropic said the immediate shutdown was necessary for compliance and that its other models were unaffected.

Anthropic shuts down Fable, Mythos models following Trump admin. directive - Ars Technica

Anthropic makes Fable 5 fallback behavior visible to users

After backlash over hidden safeguard-triggered downgrades, Anthropic changed Claude Fable 5 so flagged requests visibly fall back to Claude Opus 4.8. The company also began providing API users with a reason when a request is refused.

Claude Fable 5 secretly throttled AI researchers, and the internet went wild | ZDNET

Jun 11, 20263d ago

Fable 5 system prompt is reportedly leaked to GitHub

Following the reported jailbreak, Pliny the Liberator allegedly published Fable 5's approximately 120,000-character system prompt on GitHub. The leak exposed the prompt used to govern the model's behavior and safeguards.

Anthropic's Claude Fable 5 Jailbroken to Generate Stack Exploits - Cyber Security News

Pliny the Liberator reportedly jailbreaks Claude Fable 5

Shortly after Fable 5's release, researcher Pliny the Liberator reportedly bypassed its safety controls using a multi-agent strategy involving Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests. The reported bypass enabled offensive cybersecurity guidance and harmful chemistry-related outputs, challenging Anthropic's pre-launch claim that external testing had found no universal jailbreaks.

Anthropic's Claude Fable 5 Jailbroken to Generate Stack Exploits - Cyber Security News

Jun 10, 20264d ago

Researchers criticize Fable's guardrails as overly broad

By June 10, 2026, cybersecurity researchers were publicly criticizing Fable's restrictions, saying it blocked benign tasks such as reading security blogs, writing secure code, and conducting code reviews. The criticism centered on Fable frequently falling back to Claude Opus 4.8 when cyber-related requests were triggered.

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable | TechCrunch

Jun 9, 20265d ago

Anthropic publishes Mythos 5/Fable 5 system card

Anthropic's June 9, 2026 system card detailed the capabilities and risk assessment for Claude Mythos 5 and Claude Fable 5. It said Fable 5 uses classifiers and fallback behavior to Claude Opus 4.8 for high-risk cyber, biology, chemistry, and frontier LLM-development requests, and assessed Mythos 5 as Tier 1 cyber offense risk.

Anthropic

Anthropic launches Claude Fable 5 for public access

On June 9, 2026, Anthropic launched Claude Fable 5 as the first public model in its new Mythos class. Anthropic described Fable 5 and the restricted Claude Mythos 5 as two configurations of the same underlying model, with Fable 5 made generally available under added safeguards.

Anthropic's Claude Fable 5 Jailbroken to Generate Stack Exploits - Cyber Security News

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

62 LINKEDOpen in app

Affected products

9 linked

DeepseekClaude CodeWindows ServerFirefoxChatgptGoogle DriveChromeCopilotClaude

Organizations

53 linked

AnthropicOpenaiAxiosZhipuAlibaba CloudHugging FaceTom's HardwareDeepseekMoonshot AIMicrosoft CorporationGoogleMozillaLinkedinXGitHubInsight Media GroupInstructureReplitThe Wall Street JournalNvidiaKubernetesXBOWTechCrunchInternational Business MachinesCato NetworksAdvanced Micro DevicesDark ReadingBritish Broadcasting CorporationIvantiHarnessWIREDServicenowAppleDeloitteReutersCursorZDNETIntelFortuneExabeamNetlifyMiggoZero NetworksGetty ImagesUber TechnologiesGray SwanZ.aiAndon LabsTolmoMeridian LabsDyno TherapeuticsSilicon DataCitadel Securities

SOURCE COVERAGE

Sources

17 references tracked. Mallory keeps watching after this page renders.

17 SOURCESView all

Toms HardwareNews

Jun 13, 2026

U.S. gov't orders Anthropic to disable its newest AI models worldwide due to security threats - ban on Claude Fable 5 and Mythos 5 bars access by any foreign national, even its own employees | Tom's Hardware

tomshardware.com

Open source

ThenewstackNews

Jun 13, 2026

Claude Fable cost $9 in one coding test. GPT-5.5 cost $1.50. Model triage is the new AI skill. - The New Stack

thenewstack.io

Open source

The Hacker NewsNews

Jun 13, 2026

U.S. Orders Anthropic to Suspend Fable 5 and Mythos 5 Access for Foreign Nationals

thehackernews.com

Open source

ArstechnicaNews

Jun 13, 2026

Anthropic shuts down Fable, Mythos models following Trump admin. directive - Ars Technica

arstechnica.com

Open source

9 additional sources from 11-06-2026 to 13-06-2026

Techcrunch Com SecurityNews

Jun 10, 2026

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable | TechCrunch

techcrunch.com

Open source

OpennetNews

Dec 25, 2025

�� Anthropic �� AI-�� Fable 5 � Mythos 5

opennet.me

Open source

Opennet RuNews

Dec 25, 2025

�� Anthropic �� AI-�� Fable 5 � Mythos 5

opennet.ru

Open source

AnthropicNews

Anthropic

www-cdn.anthropic.com

Open source

The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.

Start free trial

Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails

Get ahead of threats like this

How this story unfolded

Anthropic shuts down Mythos 5 and Fable 5 after US directive

Anthropic makes Fable 5 fallback behavior visible to users

Fable 5 system prompt is reportedly leaked to GitHub

Pliny the Liberator reportedly jailbreaks Claude Fable 5

Researchers criticize Fable's guardrails as overly broad

Anthropic publishes Mythos 5/Fable 5 system card

Anthropic launches Claude Fable 5 for public access

Related entities

Sources

U.S. gov't orders Anthropic to disable its newest AI models worldwide due to security threats - ban on Claude Fable 5 and Mythos 5 bars access by any foreign national, even its own employees | Tom's Hardware

Claude Fable cost $9 in one coding test. GPT-5.5 cost $1.50. Model triage is the new AI skill. - The New Stack

U.S. Orders Anthropic to Suspend Fable 5 and Mythos 5 Access for Foreign Nationals

Anthropic shuts down Fable, Mythos models following Trump admin. directive - Ars Technica

Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable | TechCrunch

������ ��� ���������� Anthropic ������������� ������ � AI-������� Fable 5 � Mythos 5

������ ��� ���������� Anthropic ������������� ������ � AI-������� Fable 5 � Mythos 5

Anthropic

See the full picture, correlated to your attack surface.

Related stories

Anthropic Limits Access to Claude Mythos for AI-Driven Vulnerability Discovery

Unauthorized Users Access Anthropic’s Restricted Claude Mythos Cyber Model

Unauthorized Access to Anthropic’s Mythos Preview Exposed Supply-Chain Weaknesses

�� Anthropic �� AI-�� Fable 5 � Mythos 5

�� Anthropic �� AI-�� Fable 5 � Mythos 5