Anthropic said it has globally re-deployed Claude Fable 5 with updated cybersecurity controls and released new technical details on how the model handles cyber-related prompts. The company said the system uses safety classifiers to sort requests into prohibited, high-risk dual-use, low-risk dual-use, and benign categories rather than blocking all security activity, allowing some defensive and educational use while aiming to stop harmful assistance. Anthropic said prohibited requests include malware development, ransomware, wipers, data exfiltration, defense evasion, offensive infrastructure such as C2, destructive attacks, and cyber-physical sabotage, and that the model applies a larger safety margin than earlier versions to reduce dangerous outputs even if that increases false positives.
Anthropic also published an early draft Cyber Jailbreak Severity (CJS) framework, developed with Glasswing, to rate AI jailbreaks from CJS-0 to CJS-4 based on capability gain, breadth of impact, ease of weaponization, and discoverability. The company said the framework is intended to create a shared vocabulary for assessing jailbreak risk across industry and government, particularly for cases involving high-uplift vulnerability discovery or exploit generation. Anthropic invited external feedback through a dedicated contact channel and launched a HackerOne bug bounty program for researchers to report potential cyber jailbreaks affecting Fable 5.

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
3 events from the most recent confirmed update back to the earliest known activity.
Anthropic launched a HackerOne bug bounty program for researchers to submit potential cyber jailbreaks affecting Fable 5 and invited feedback to refine its approach to AI cybersecurity jailbreak risk.
Anthropic published additional technical details on Claude Fable 5's cybersecurity safeguards and proposed a draft Cyber Jailbreak Severity framework for rating AI jailbreaks based on capability gain, breadth, ease of weaponization, and discoverability.
Anthropic announced the global re-deployment of Claude Fable 5 and said the model uses classifier-based cybersecurity safeguards with a larger safety margin than prior models to reduce harmful outputs.
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
2 references tracked. Mallory keeps watching after this page renders.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.