Anthropic introduced real-time cyber safeguards for its Claude Opus and Sonnet models that automatically block prompts tied to prohibited or high-risk cyber activity, including mass data exfiltration and ransomware development. The company said some dual-use requests, such as vulnerability exploitation or offensive security tooling, may still be permitted through its application-based Cyber Verification Program, which is intended for vetted defensive security professionals and is not currently available for Zero Data Retention users or through Amazon Bedrock and Google Vertex AI.
Anthropic also launched a HackerOne program to reward researchers who find high-impact jailbreaks that let Claude Fable 5 generate exploit code, malware, or detailed attack guidance it would normally refuse. The moves come amid broader industry concern that advanced models can sharply reduce the time and skill needed for offensive cyber work, while related reporting tied Anthropic’s vulnerability-discovery efforts to IBM and Red Hat’s new Project Lightwell, a $5 billion service aimed at backporting and delivering signed fixes for open-source flaws uncovered faster than enterprises can patch them.

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
3 events from the most recent confirmed update back to the earliest known activity.
IBM and Red Hat launched Project Lightwell, a $5 billion subscription service intended to identify vulnerabilities in deployed open-source versions, create backported fixes, and deliver signed validated patches under SLAs. IBM said 20,000 engineers were assigned to the effort, with 11 major financial institutions as design partners and Deloitte supporting regulated supply-chain and patch-deployment work.
Anthropic published a HackerOne program focused on identifying high-impact jailbreaks against Claude Fable 5 that materially increase offensive cyber capability. The policy defined scope, exclusions, severity criteria, reporting requirements, and safe-harbor expectations for researchers.
Anthropic announced new real-time cyber safeguards for Claude Opus and Sonnet that automatically detect and block requests associated with prohibited or high-risk cybersecurity use. The company said certain high-risk dual-use activities may be reviewed through its Cyber Verification Program for legitimate defensive professionals.
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
4 references tracked. Mallory keeps watching after this page renders.
darkreading.com
Open sourcesecuritysenses.com
Open sourcehackerone.com
Open sourcesupport.claude.com
Open sourceMap indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.