Anthropic Publishes Claude Fable 5 Cyber Safeguards and Jailbreak Severity Framework

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS

Jul 2, 20262d ago

Anthropic launches HackerOne program for Fable 5 jailbreak reports

Anthropic launched a HackerOne bug bounty program for researchers to submit potential cyber jailbreaks affecting Fable 5 and invited feedback to refine its approach to AI cybersecurity jailbreak risk.

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

Anthropic publishes Fable 5 cyber safeguards and draft jailbreak framework

Anthropic published additional technical details on Claude Fable 5's cybersecurity safeguards and proposed a draft Cyber Jailbreak Severity framework for rating AI jailbreaks based on capability gain, breadth, ease of weaponization, and discoverability.

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

Anthropic globally re-deploys Claude Fable 5

Anthropic announced the global re-deployment of Claude Fable 5 and said the model uses classifier-based cybersecurity safeguards with a larger safety margin than prior models to reduce harmful outputs.

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

7 LINKEDOpen in app

Vulnerabilities

1 linked

Log4Shell

Malware

1 linked

Log4Shell

Organizations

5 linked

AnthropicHackerOneGlasswingOWASP FoundationAny.Run

SOURCE COVERAGE

Sources

2 references tracked. Mallory keeps watching after this page renders.

2 SOURCESView all

Cyber Security NewsNews

Jul 3, 2026

Anthropic Details Claude Fable 5 Cybersecurity Safeguards and Jailbreak Framework

cybersecuritynews.com

Open source

AnthropicNews

Jul 2, 2026

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

anthropic.com

Open source

ON THE SAME THREAD

Anthropic launched **Claude Fable 5** as the public, safeguarded version of its new **Mythos 5** model, describing both as configurations of the same underlying system and positioning Mythos 5 as its strongest cyber-capable model to date. In its system card, Anthropic said Fable 5 uses classifiers and fallback behavior to divert high-risk requests to **Claude Opus 4.8**, keeping its cyber performance roughly in line with the older model while Mythos 5 remains restricted to vetted partners through **Project Glasswing**. The company assessed Mythos 5 as a **Tier 1 cyber offense risk**, meaning it can materially assist offensive operations but does not autonomously conduct adaptive cyber campaigns. Soon after release, researcher **Pliny the Liberator** reportedly bypassed Fable 5’s safeguards using a multi-agent jailbreak that combined Unicode obfuscation, long-context manipulation, narrative framing, and decomposition of harmful requests, allegedly eliciting exploit-development and chemistry-related guidance and exposing a leaked system prompt on GitHub. The incident undercut Anthropic’s claim that more than 1,000 hours of external testing had found no universal jailbreaks, while also intensifying criticism from security researchers who said Fable 5’s guardrails were already so broad that they blocked benign work such as reading security blogs, reviewing code, and writing secure software. Anthropic had not publicly responded to the jailbreak claims at the time of reporting.

Jul 2, 2026

Anthropic Tightens AI Cyber Controls With Safeguards and Jailbreak Bounty

Anthropic introduced real-time cyber safeguards for its Claude Opus and Sonnet models that automatically block prompts tied to prohibited or high-risk cyber activity, including mass data exfiltration and ransomware development. The company said some dual-use requests, such as vulnerability exploitation or offensive security tooling, may still be permitted through its application-based Cyber Verification Program, which is intended for vetted defensive security professionals and is not currently available for Zero Data Retention users or through Amazon Bedrock and Google Vertex AI. Anthropic also launched a HackerOne program to reward researchers who find high-impact jailbreaks that let Claude Fable 5 generate exploit code, malware, or detailed attack guidance it would normally refuse. The moves come amid broader industry concern that advanced models can sharply reduce the time and skill needed for offensive cyber work, while related reporting tied Anthropic’s vulnerability-discovery efforts to IBM and Red Hat’s new **Project Lightwell**, a $5 billion service aimed at backporting and delivering signed fixes for open-source flaws uncovered faster than enterprises can patch them.

Jul 3, 2026

Anthropic Restricts Mythos AI After It Finds and Exploits Thousands of Software Flaws

Anthropic unveiled **Claude Mythos Preview**, an unreleased AI model it says can autonomously discover and exploit severe software vulnerabilities across major operating systems, browsers, open-source projects, and some closed-source targets. The company said the model uncovered thousands of high-severity flaws, including long-lived bugs in **OpenBSD**, **FFmpeg**, Linux kernel privilege-escalation chains, and **`CVE-2026-4747`**, a FreeBSD NFS remote code execution flaw that could enable unauthenticated root access. Anthropic withheld broad release, citing offensive cyber risk, and instead launched **Project Glasswing**, a gated program for roughly 40 to 50 partners such as AWS, Apple, Cisco, Cloudflare, Google, JPMorgan Chase, Microsoft, Mozilla, NVIDIA, and Palo Alto Networks to validate findings, patch affected software, and study defensive uses. Independent and industry assessments broadly agreed Mythos marks a significant advance in AI-enabled cyber capability, though several researchers questioned how much of Anthropic’s headline claims can yet be verified through public CVEs and warned that similar results may be reproducible with cheaper or open models plus strong tooling. The UK AI Security Institute found Mythos achieved a **73%** success rate on expert capture-the-flag tasks and completed a full 32-step simulated enterprise attack in 3 of 10 runs, while Anthropic later reported coordinated disclosure activity spanning **1,596 vulnerabilities across 281 open-source projects** and partners identifying more than **10,000** high- or critical-severity candidates. Governments, financial regulators, and CISO groups in the US, UK, Europe, Canada, and Japan responded with briefings and warnings that AI is compressing the gap between vulnerability discovery and weaponization, leaving remediation, patch governance, and defensive automation as the main bottlenecks.

Jun 29, 2026

Anthropic Publishes Claude Fable 5 Cyber Safeguards and Jailbreak Severity Framework

Get ahead of threats like this

How this story unfolded

Anthropic launches HackerOne program for Fable 5 jailbreak reports

Anthropic publishes Fable 5 cyber safeguards and draft jailbreak framework

Anthropic globally re-deploys Claude Fable 5

Related entities

Sources

Anthropic Details Claude Fable 5 Cybersecurity Safeguards and Jailbreak Framework

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

See the full picture, correlated to your attack surface.

Anthropic Publishes Claude Fable 5 Cyber Safeguards and Jailbreak Severity Framework

Get ahead of threats like this

How this story unfolded

Anthropic launches HackerOne program for Fable 5 jailbreak reports

Anthropic publishes Fable 5 cyber safeguards and draft jailbreak framework

Anthropic globally re-deploys Claude Fable 5

Related entities

Sources

Anthropic Details Claude Fable 5 Cybersecurity Safeguards and Jailbreak Framework

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

See the full picture, correlated to your attack surface.

Anthropic Publishes Claude Fable 5 Cyber Safeguards and Jailbreak Severity Framework

Get ahead of threats like this

How this story unfolded

Anthropic launches HackerOne program for Fable 5 jailbreak reports

Anthropic publishes Fable 5 cyber safeguards and draft jailbreak framework

Anthropic globally re-deploys Claude Fable 5

Related entities

Sources

Anthropic Details Claude Fable 5 Cybersecurity Safeguards and Jailbreak Framework

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

See the full picture, correlated to your attack surface.

Related stories

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails

Anthropic Tightens AI Cyber Controls With Safeguards and Jailbreak Bounty

Anthropic Restricts Mythos AI After It Finds and Exploits Thousands of Software Flaws

Anthropic Publishes Claude Fable 5 Cyber Safeguards and Jailbreak Severity Framework

Get ahead of threats like this

How this story unfolded

Anthropic launches HackerOne program for Fable 5 jailbreak reports

Anthropic publishes Fable 5 cyber safeguards and draft jailbreak framework

Anthropic globally re-deploys Claude Fable 5

Related entities

Sources

Anthropic Details Claude Fable 5 Cybersecurity Safeguards and Jailbreak Framework

More details on Fable 5’s cyber safeguards and our jailbreak framework \ Anthropic

See the full picture, correlated to your attack surface.

Related stories

Claude Fable 5 Jailbreak Exposes Limits of Anthropic’s Cyber Guardrails

Anthropic Tightens AI Cyber Controls With Safeguards and Jailbreak Bounty

Anthropic Restricts Mythos AI After It Finds and Exploits Thousands of Software Flaws