Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation
Recent research and evaluations indicate AI agents are becoming capable of finding and exploiting vulnerabilities with high success rates using standard offensive tooling, lowering the barrier to semi-autonomous attacks. A study by Irregular in collaboration with Wiz reported that leading models (Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro) solved 9 of 10 web security CTF challenges modeled on real-world incident patterns, including authentication bypass, exposed secrets, stored XSS, and SSRF (including AWS Instance Metadata Service (IMDS)-style SSRF). Researchers noted that even when success required multiple stochastic runs, the low per-run cost (~$2) and limited repeats could make exploitation practical without necessarily triggering monitoring, with most challenge successes costing under $1 and multi-run cases totaling roughly $1–$10.
Separate evaluation results highlighted by Bruce Schneier, citing an Anthropic post, describe Claude Sonnet 4.5 successfully executing multistage attacks across simulated networks using only standard open-source tools rather than custom cyber toolkits, including exfiltrating all simulated PII in a high-fidelity Equifax-breach simulation by recognizing and exploiting a known publicized CVE. In parallel, Dark Reading reported security concerns around the rapid adoption of an open-source autonomous assistant, OpenClaw (formerly MoltBot/ClawdBot), which can connect to email, files, messaging, and system tools, execute terminal commands and scripts, and maintain memory across sessions—creating persistent non-human identities and access paths that may fall outside traditional IAM and secrets controls, increasing enterprise risk as “bring-your-own-AI” agents gain privileged access.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
12 events from the most recent confirmed update back to the earliest known activity.
Researcher reports LLM swarm autonomously discovered more than 20 CVEs
A researcher described a self-orchestrating multi-agent LLM system that autonomously found more than 20 CVEs, including Linux ksmbd flaws CVE-2026-31432 and CVE-2026-31433 and CUPS bugs CVE-2026-34980 and CVE-2026-34990. The project used specialized agents for target seeding, hypothesis generation, PoC testing in isolated VMs, report writing, grading, and coordination, and concluded that smaller models can compete with frontier models when given enough inference-time compute and orchestration.
Microsoft research shows AI can generate realistic synthetic attack telemetry
Microsoft researchers described using large language models to generate realistic synthetic command lines, process trees, and attack sequences that mimic human-operated intrusions. The work is aimed at helping defenders test detections, train analysts, and validate logging and triage workflows in controlled environments with governance guardrails.
UK AISI reports frontier AI cyber-task capability is improving faster than expected
The UK AI Security Institute said its time-window benchmark showed frontier models such as Claude Sonnet 4.5 can complete cybersecurity tasks comparable to about 16 minutes of expert human work with 80 percent reliability under a 2.5 million token budget. AISI revised its estimated doubling period for this level of cyber-task performance from 8 months to 4.7 months and reported newer models including Anthropic Mythos Preview exceeded that trend, including partial success on simulated corporate network and industrial control system attack chains.
Hacktron demonstrates Claude Opus 4.6 building Discord Chromium exploit chain
Hacktron CTO Mohan Pedhapati said Anthropic’s Claude Opus 4.6 helped produce a functional Chrome/V8 exploit chain against Discord’s bundled Chromium for about $2,283 in API costs after roughly a week of iteration and human supervision. The report highlighted how AI-assisted patch analysis can accelerate weaponization of known flaws in Electron apps such as Discord, Slack, and Teams when embedded Chromium versions lag upstream fixes.
Anthropic says Claude Mythos Preview finds and exploits zero-days
Anthropic reported in testing published on 2026-04-07 that its Claude Mythos Preview model could autonomously discover zero-day vulnerabilities and develop working exploits across major operating systems and browsers, outperforming earlier models on exploit-development benchmarks. The company said the model identified thousands of high- and critical-severity flaws, including a FreeBSD NFS server RCE tracked as CVE-2026-4747, and could also rapidly weaponize N-day Linux kernel vulnerabilities.
Paper measures AI agents on multi-step corporate and ICS attack ranges
A March 2026 arXiv paper evaluated frontier AI models on a 32-step corporate network scenario and a 7-step industrial control system scenario, finding capability improved with more inference-time compute and newer model generations. The best run completed 22 of 32 corporate-network steps, while ICS performance remained limited but recent models were the first to reliably complete some steps.
Study shows AI agents struggle in broad-scope and real-world root-cause hunts
The same study found performance dropped when agents had to search a full attack surface without a defined entry point, increasing cost and reducing investigative depth. In a real-world AWS Bedrock anomaly case, an AI agent failed to identify the root cause, while a human quickly traced it to an exposed RabbitMQ management interface with default credentials.
Irregular and Wiz study finds AI agents solve 9 of 10 web security challenges
A study by Irregular in collaboration with Wiz tested Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro on 10 web security CTF challenges derived from real-world incidents. The researchers found the leading models solved nine of 10 challenges when given directed, per-site objectives using standard security tools.
Researchers observe attacker interest in exposed OpenClaw deployments
Security researchers and vendors reported early signs of malicious interest in OpenClaw, including scanning for the agent’s default port and attempts to bypass authentication. They also warned of supply-chain risk tied to the project’s large contributor base and rapid development pace.
OpenClaw open-source AI agent rapidly gains adoption and scrutiny
The open-source AI agent OpenClaw, previously called ClawdBot and MoltBot, rapidly became the fastest-growing project on GitHub. Its direct connections to email, files, messaging platforms, and system tools with autonomous capabilities prompted security concerns about enterprise deployment.
Anthropic says Claude simulated an Equifax-style data exfiltration attack
In the same reported testing, Anthropic said Claude Sonnet 4.5 exfiltrated all simulated personal data in a high-fidelity Equifax-breach scenario using only a Bash shell on a Kali Linux host. The company attributed this to the model recognizing a public CVE and generating exploit code without needing iterative refinement.
Anthropic reports Claude Sonnet 4.5 can autonomously exploit known flaws
An Anthropic blog post said current Claude models had improved cyber capabilities, including carrying out multistage attacks across networks with dozens of hosts using standard open-source tools. It reported that Claude Sonnet 4.5 succeeded in some tests without the custom cyber toolkit required by earlier model generations.
Related entities
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
Sources
11 references tracked. Mallory keeps watching after this page renders.
Microsoft Research Shows AI Can Generate Realistic Command Lines and Process Telemetry
cybersecuritynews.com
Open sourceYou’re not going to patch your way out of this - PSW #926 | SC Media
scworld.com
Open sourceAI models are getting better at replacing cybersecurity pros on certain tasks
theregister.com
Open sourceIntroducing Bugflation - Bugflation
bugflation.com
Open source[2603.11214] Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
arxiv.org
Open sourceAI agents solve 9 of 10 web security CTF challenges in recent study | SC Media
scworld.com
Open sourceAIs Are Getting Better at Finding and Exploiting Security Vulnerabilities - Schneier on Security
schneier.com
Open sourceOpenClaw AI Runs Wild in Business Environments
darkreading.com
Open sourceSee the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


