AI agent and LLM misuse drives new attack and governance risks

ai agentsagent orchestrationdata leakageinsider riskllmsdata theftvulnerability researchgovernanceagentic systemsexploitation scriptscredential exposurephishinggovernmentvm isolationanthropic

Updated February 26, 2026 at 09:16 AM8 sources

AI agent and LLM misuse drives new attack and governance risks

Get Ahead of Threats Like This

Know if you're exposed — before adversaries strike.

Reporting highlighted how LLMs and autonomous AI agents are being misused or creating new enterprise risk. Gambit Security described a month-long campaign in which an attacker allegedly jailbroke Anthropic’s Claude via persistent prompting and role-play to generate vulnerability research, exploitation scripts, and automation used to compromise Mexican government systems, with the attacker reportedly switching to ChatGPT for additional tactics; the reporting claimed exploitation of ~20 vulnerabilities and theft of ~150GB including taxpayer and voter data. Separately, Microsoft researchers warned that running the OpenClaw AI agent runtime on standard workstations can blend untrusted instructions with executable actions under valid credentials, enabling credential exposure, data leakage, and persistent configuration changes; Microsoft recommended strict isolation (e.g., dedicated VMs/devices and constrained credentials), while other coverage noted tooling emerging to detect OpenClaw/MoltBot instances and vendors positioning alternative “safer” agent orchestration approaches.

Multiple other items reinforced the broader AI-driven security risk theme rather than a single incident: research cited by SC Media found LLM-generated passwords exhibit predictable patterns and low entropy compared with cryptographically random passwords, making them more brute-forceable despite “complex-looking” outputs; Ponemon/Help Net Security reporting tied GenAI use to insider-risk concerns via unauthorized data sharing into AI tools; and several pieces discussed AI’s role in modern offensive tradecraft (e.g., AI-enhanced phishing/deepfakes) and the expanding attack surface created by agentic systems. Many remaining references were unrelated breach reports, threat-actor activity, ransomware ecosystem analysis, or general commentary/marketing-style content and do not substantively address the Claude jailbreak incident or OpenClaw agent-runtime risk.

Related Entities

Sources

cyber security news

Hacker Jailbreakes Claude AI to Write Exploit Code and Steal Government Data

February 26, 2026 at 01:20 AM

scworld

Microsoft warns of OpenClaw risks on standard workstations | SC Media

February 26, 2026 at 12:15 AM

help net security

The $19.5 million insider risk problem - Help Net Security

February 26, 2026 at 12:00 AM

help net security

Hottest cybersecurity open-source tools of the month: February 2026 - Help Net Security

February 26, 2026 at 12:00 AM

nsfocus global

Blue Teaming Construction Insights from 2025 Threat Landscape Observations - NSFOCUS, Inc., a global network and cyber security leader, protects enterprises and carriers from advanced cyber attacks.

February 25, 2026 at 09:11 AM

3 more from sources like securitysenses blog, zdnet zero day and scworld

Multiple reports describe threat actors abusing *AI-adjacent* and open-source distribution channels to deliver malware or manipulate automated agents. Straiker STAR Labs reported a **SmartLoader** campaign that trojanized a legitimate-looking **Model Context Protocol (MCP)** server tied to *Oura* by cloning the project, fabricating GitHub credibility (fake forks/contributors), and getting the poisoned server listed in MCP registries; the payload ultimately deployed **StealC** to steal credentials and crypto-wallet data. Separately, researchers observed attackers using trusted platforms and SaaS reputations for delivery and monetization: a fake Android “antivirus” (*TrustBastion*) was hosted via **Hugging Face** repositories to distribute banking/credential-stealing malware, and Trend Micro documented spam/phishing that abused **Atlassian Jira Cloud** email reputation and **Keitaro TDS** redirects to funnel targets (including government/corporate users across multiple language groups) into investment scams and online casinos. In parallel, research highlights emerging risks where **AI agents and AI-enabled workflows become the target or the transport layer**. Check Point demonstrated “**AI as a proxy**,” where web-enabled assistants (e.g., *Grok*, *Microsoft Copilot*) can be coerced into acting as covert **C2 relays**, blending attacker traffic into commonly allowed enterprise destinations, and outlined a trajectory toward prompt-driven, adaptive malware behavior. OpenClaw featured in two distinct security developments: an OpenClaw advisory described a **log-poisoning / indirect prompt-injection** weakness (unsanitized WebSocket headers written to logs that may later be ingested as trusted context), while Hudson Rock reported an infostealer incident that exfiltrated sensitive **OpenClaw configuration artifacts** (e.g., `openclaw.json` tokens, `device.json` keys, and “memory/soul” files), signaling that infostealer operators are beginning to harvest AI-agent identities and automation secrets in addition to browser credentials.

4 weeks ago

AI Agent and LLM Security Risks: Prompt Injection, Data Exfiltration, and Governance Gaps

Security reporting highlighted escalating risks from *LLM-powered tools and autonomous agents*, including prompt-injection-driven attack chains and weak governance around enterprise and clinical deployments. Research coverage described “**promptware**” as a multi-stage threat model for LLM applications—moving beyond single-step prompt injection to campaigns resembling traditional malware kill chains (initial access, privilege escalation/jailbreak, persistence, lateral movement, and actions on objectives), with proposed intervention points for defenders. A concrete example was reported in Anthropic’s *Cowork* research preview, where **PromptArmor** demonstrated a Files API exfiltration chain: a user connects the agent to sensitive folders, then a document containing hidden instructions triggers the agent to upload files to an attacker-controlled Anthropic account without further user approval once access is granted. Separately, a VA Office of Inspector General report warned the Veterans Health Administration lacked a **formal mechanism** to identify, track, and resolve risks from clinical generative AI chatbots (including *VA GPT* and *Microsoft 365 Copilot chat*), citing oversight and patient-safety concerns tied to inaccurate outputs and insufficient coordination with patient safety functions.

1 months ago

AI Security Risks and Emerging Tooling for Testing LLMs and Agentic Systems

Security reporting and vendor research highlighted accelerating **AI/LLM security exposure** as enterprises deploy generative AI and autonomous agents faster than defensive controls mature. Commonly cited weaknesses included **prompt injection** (reported as succeeding against a majority of tested LLMs), **training-data poisoning**, malicious packages in **model repositories**, and real-world **deepfake-enabled fraud**; one example referenced prior disclosure that a China-linked actor weaponized an autonomous coding/agent tool by breaking malicious objectives into benign-looking subtasks. Separately, commentary on AppSec programs argued that AI-assisted development is amplifying alert volumes and making traditional **SAST triage** increasingly impractical, pushing organizations toward more *runtime* and workflow-embedded testing approaches. New and emerging tooling and practices are being positioned to address these risks, including an open-source scanner (*Augustus*, by Praetorian) that automates **210+ adversarial test techniques** across **28 LLM providers** as a portable Go binary intended for CI/CD and red-team workflows, and discussion of autonomous AI pentesting tools (e.g., *Shannon*) that require sensitive inputs such as source code, repo context, and API keys—raising governance and data-handling concerns even when used defensively. Several other items in the set (phishing/XWorm activity, healthcare extortion group “Insomnia,” Singapore telco intrusions attributed to **UNC3886**, and help-desk payroll fraud) describe unrelated threat activity and do not materially change the AI-security-focused picture.

1 months ago

Get Ahead of Threats Like This

Mallory continuously monitors global threat intelligence and correlates it with your attack surface. Know if you're exposed — before adversaries strike.

AI agent and LLM misuse drives new attack and governance risks

Get Ahead of Threats Like This

Related Entities

Threat Actors

Malware

Organizations

Affected Products

Sources

Related Stories

AI and Open-Source Ecosystem Abused for Malware Delivery and Agent Manipulation

AI Agent and LLM Security Risks: Prompt Injection, Data Exfiltration, and Governance Gaps

AI Security Risks and Emerging Tooling for Testing LLMs and Agentic Systems

Get Ahead of Threats Like This