Skip to main content
Mallory
Back to intelligence
ai-enabled-threat-activityai-platform-securityinitial-access-methoddata-exfiltration-method

Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation

Updated 26d agoFirst seen Jan 31, 202611 sources

Recent research and evaluations indicate AI agents are becoming capable of finding and exploiting vulnerabilities with high success rates using standard offensive tooling, lowering the barrier to semi-autonomous attacks. A study by Irregular in collaboration with Wiz reported that leading models (Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro) solved 9 of 10 web security CTF challenges modeled on real-world incident patterns, including authentication bypass, exposed secrets, stored XSS, and SSRF (including AWS Instance Metadata Service (IMDS)-style SSRF). Researchers noted that even when success required multiple stochastic runs, the low per-run cost (~$2) and limited repeats could make exploitation practical without necessarily triggering monitoring, with most challenge successes costing under $1 and multi-run cases totaling roughly $1–$10.

Separate evaluation results highlighted by Bruce Schneier, citing an Anthropic post, describe Claude Sonnet 4.5 successfully executing multistage attacks across simulated networks using only standard open-source tools rather than custom cyber toolkits, including exfiltrating all simulated PII in a high-fidelity Equifax-breach simulation by recognizing and exploiting a known publicized CVE. In parallel, Dark Reading reported security concerns around the rapid adoption of an open-source autonomous assistant, OpenClaw (formerly MoltBot/ClawdBot), which can connect to email, files, messaging, and system tools, execute terminal commands and scripts, and maintain memory across sessions—creating persistent non-human identities and access paths that may fall outside traditional IAM and secrets controls, increasing enterprise risk as “bring-your-own-AI” agents gain privileged access.

Share:
Research Warns AI Agents Are Rapidly Improving at Vulnerability Discovery and Exploitation
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

12 events from the most recent confirmed update back to the earliest known activity.

12 EVENTS
May 14, 202627d ago

Researcher reports LLM swarm autonomously discovered more than 20 CVEs

A researcher described a self-orchestrating multi-agent LLM system that autonomously found more than 20 CVEs, including Linux ksmbd flaws CVE-2026-31432 and CVE-2026-31433 and CUPS bugs CVE-2026-34980 and CVE-2026-34990. The project used specialized agents for target seeding, hypothesis generation, PoC testing in isolated VMs, report writing, grading, and coordination, and concluded that smaller models can compete with frontier models when given enough inference-time compute and orchestration.

You’re not going to patch your way out of this - PSW #926 | SC Media

Microsoft research shows AI can generate realistic synthetic attack telemetry

Microsoft researchers described using large language models to generate realistic synthetic command lines, process trees, and attack sequences that mimic human-operated intrusions. The work is aimed at helping defenders test detections, train analysts, and validate logging and triage workflows in controlled environments with governance guardrails.

Microsoft Research Shows AI Can Generate Realistic Command Lines and Process Telemetry

UK AISI reports frontier AI cyber-task capability is improving faster than expected

The UK AI Security Institute said its time-window benchmark showed frontier models such as Claude Sonnet 4.5 can complete cybersecurity tasks comparable to about 16 minutes of expert human work with 80 percent reliability under a 2.5 million token budget. AISI revised its estimated doubling period for this level of cyber-task performance from 8 months to 4.7 months and reported newer models including Anthropic Mythos Preview exceeded that trend, including partial success on simulated corporate network and industrial control system attack chains.

AI models are getting better at replacing cybersecurity pros on certain tasks
Apr 20, 20262mo ago

Hacktron demonstrates Claude Opus 4.6 building Discord Chromium exploit chain

Hacktron CTO Mohan Pedhapati said Anthropic’s Claude Opus 4.6 helped produce a functional Chrome/V8 exploit chain against Discord’s bundled Chromium for about $2,283 in API costs after roughly a week of iteration and human supervision. The report highlighted how AI-assisted patch analysis can accelerate weaponization of known flaws in Electron apps such as Discord, Slack, and Teams when embedded Chromium versions lag upstream fixes.

AI Model Claude Opus turns bugs into exploits for just $2,283
Apr 7, 20262mo ago

Anthropic says Claude Mythos Preview finds and exploits zero-days

Anthropic reported in testing published on 2026-04-07 that its Claude Mythos Preview model could autonomously discover zero-day vulnerabilities and develop working exploits across major operating systems and browsers, outperforming earlier models on exploit-development benchmarks. The company said the model identified thousands of high- and critical-severity flaws, including a FreeBSD NFS server RCE tracked as CVE-2026-4747, and could also rapidly weaponize N-day Linux kernel vulnerabilities.

Anthropic's new AI model finds and exploits zero-days across every major OS and browser - Help Net Security
Mar 11, 20263mo ago

Paper measures AI agents on multi-step corporate and ICS attack ranges

A March 2026 arXiv paper evaluated frontier AI models on a 32-step corporate network scenario and a 7-step industrial control system scenario, finding capability improved with more inference-time compute and newer model generations. The best run completed 22 of 32 corporate-network steps, while ICS performance remained limited but recent models were the first to reliably complete some steps.

[2603.11214] Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
Jan 30, 20264mo ago

Study shows AI agents struggle in broad-scope and real-world root-cause hunts

The same study found performance dropped when agents had to search a full attack surface without a defined entry point, increasing cost and reducing investigative depth. In a real-world AWS Bedrock anomaly case, an AI agent failed to identify the root cause, while a human quickly traced it to an exposed RabbitMQ management interface with default credentials.

Irregular and Wiz study finds AI agents solve 9 of 10 web security challenges

A study by Irregular in collaboration with Wiz tested Anthropic Claude Sonnet 4.5, OpenAI GPT-5, and Google Gemini 2.5 Pro on 10 web security CTF challenges derived from real-world incidents. The researchers found the leading models solved nine of 10 challenges when given directed, per-site objectives using standard security tools.

Researchers observe attacker interest in exposed OpenClaw deployments

Security researchers and vendors reported early signs of malicious interest in OpenClaw, including scanning for the agent’s default port and attempts to bypass authentication. They also warned of supply-chain risk tied to the project’s large contributor base and rapid development pace.

OpenClaw open-source AI agent rapidly gains adoption and scrutiny

The open-source AI agent OpenClaw, previously called ClawdBot and MoltBot, rapidly became the fastest-growing project on GitHub. Its direct connections to email, files, messaging platforms, and system tools with autonomous capabilities prompted security concerns about enterprise deployment.

Anthropic says Claude simulated an Equifax-style data exfiltration attack

In the same reported testing, Anthropic said Claude Sonnet 4.5 exfiltrated all simulated personal data in a high-fidelity Equifax-breach scenario using only a Bash shell on a Kali Linux host. The company attributed this to the model recognizing a public CVE and generating exploit code without needing iterative refinement.

Anthropic reports Claude Sonnet 4.5 can autonomously exploit known flaws

An Anthropic blog post said current Claude models had improved cyber capabilities, including carrying out multistage attacks across networks with dozens of hosts using standard open-source tools. It reported that Claude Sonnet 4.5 succeeded in some tests without the custom cyber toolkit required by earlier model generations.

The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.