Skip to main content
Live Webinar with SANS (June 25)— Agentic CTI Automation for Fun & ProfitRegister Free
Mallory
Back to intelligence
ai-platform-securitydata-exfiltration-methodcybersecurity-regulationpersistence-method

AI Agent and LLM Security Risks: Prompt Injection, Data Exfiltration, and Governance Gaps

Updated 3mo agoFirst seen Jan 18, 20263 sources

Security reporting highlighted escalating risks from LLM-powered tools and autonomous agents, including prompt-injection-driven attack chains and weak governance around enterprise and clinical deployments. Research coverage described “promptware” as a multi-stage threat model for LLM applications—moving beyond single-step prompt injection to campaigns resembling traditional malware kill chains (initial access, privilege escalation/jailbreak, persistence, lateral movement, and actions on objectives), with proposed intervention points for defenders.

A concrete example was reported in Anthropic’s Cowork research preview, where PromptArmor demonstrated a Files API exfiltration chain: a user connects the agent to sensitive folders, then a document containing hidden instructions triggers the agent to upload files to an attacker-controlled Anthropic account without further user approval once access is granted. Separately, a VA Office of Inspector General report warned the Veterans Health Administration lacked a formal mechanism to identify, track, and resolve risks from clinical generative AI chatbots (including VA GPT and Microsoft 365 Copilot chat), citing oversight and patient-safety concerns tied to inaccurate outputs and insufficient coordination with patient safety functions.

Share:
AI Agent and LLM Security Risks: Prompt Injection, Data Exfiltration, and Governance Gaps
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

7 events from the most recent confirmed update back to the earliest known activity.

7 EVENTS
Jan 15, 20265mo ago

VA OIG warns VHA lacks formal process to manage clinical AI chatbot risks

The VA Office of Inspector General reported that the Veterans Health Administration lacks a formal mechanism to identify, track, and resolve risks from generative AI chatbots used in clinical settings. The watchdog said the current informal oversight model limits patient-safety feedback loops and increases the risk of inaccurate or outdated chatbot outputs affecting care.

Anthropic says Cowork mitigations and VM update are in progress

Anthropic told The Register it was working on mitigations for the Cowork exfiltration issue, including a virtual machine intended to reduce access to sensitive files. The company also said it planned an update to improve how the VM interacts with the vulnerable API and to add further security improvements.

PromptArmor discloses Cowork prompt-injection file exfiltration chain

PromptArmor reported that Anthropic's Cowork product could be tricked by a hidden prompt injection in a document into uploading a user's connected files to an attacker-controlled Anthropic account. The attack chain would let the attacker query the stolen files for sensitive data such as PII and financial information.

Researchers propose five-step 'Promptware Kill Chain' model

Ben Nassi, Bruce Schneier, and Oleg Brodt proposed a five-step 'Promptware Kill Chain' framework to describe multi-stage attacks against LLM-based applications, covering initial access, privilege escalation, persistence, command and control, and actions on objectives. The model reframes prompt injection as part of broader operational attack chains rather than isolated exploits.

Oct 1, 20259mo ago

Researcher reports Claude Code Files API exfiltration risk

In October 2025, security researcher Johann Rehberger reported that Anthropic's Claude Code could be abused through prompt injection to exfiltrate files via the Files API. Anthropic acknowledged the behavior was possible but did not issue a fix, instead emphasizing user caution.

Jun 1, 20251y ago

Anthropic leaves SQL injection flaw in SQLite MCP reference server unpatched

In June 2025, Trend Micro disclosed a SQL injection vulnerability in Anthropic's archived open-source SQLite MCP server reference implementation. Anthropic considered the issue out of scope and did not patch it despite the code having been widely forked.

Jan 1, 20242y ago

VA publishes 2024 AI inventory showing broad safety-impacting use

The VA's 2024 public AI inventory listed 227 AI use cases, including 145 categorized as safety- or rights-impacting. The inventory included predictive systems such as tools intended to help identify veterans at high risk of suicide.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

7 LINKEDOpen in app
Malware
1 linked
Affected products
2 linked
Claude CodeSqlite
Organizations
4 linked
Microsoft CorporationTrend MicroPromptArmorAnthropic
The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.