Hidden Prompt-Injection and Supply-Chain Backdoors in AI Agent Skills

EVENT TIMELINE

How this story unfolded

5 events from the most recent confirmed update back to the earliest known activity.

5 EVENTS

Feb 12, 20264mo ago

Cisco documents malicious 'What Would Elon Do?' skill in repository

Cisco researchers identified a malicious AI skill called 'What Would Elon Do?' that exfiltrated data to external servers. The skill was reportedly ranked No. 1 in a skill repository, with indications its popularity may have been artificially inflated.

Simula researchers find hidden prompt injections in Moltbook posts

Simula Research Laboratory reported that 506 Moltbook posts, representing 2.6% of sampled content, contained hidden prompt-injection attacks. The finding highlighted that prompt-injection content was already being embedded in public AI-related content repositories.

Feb 11, 20264mo ago

Researcher scans OpenClawHub and OpenAI Skills for hidden Unicode

The author scanned OpenClawHub and OpenAI Skills projects for invisible Unicode codepoints and found some instances, though they were not obviously malicious and were often attributable to emoji handling or test cases. The scan was presented alongside detection guidance and a simple scanner for identifying such hidden content.

Author demonstrates hidden Unicode backdoor in AI Skills

A researcher showed that invisible Unicode Tag characters can be embedded in markdown-based AI Skills to hide prompt-injection instructions that some models interpret. The proof of concept modified a legitimate-looking security Skill so the agent would print a phrase and execute a remote shell command via curl piped to bash.

Feb 1, 20265mo ago

Claude Code reportedly begins blocking invisible Unicode tag attacks

The researcher reported that Claude Code started detecting or refusing invisible Unicode Tag-based instructions in early February 2026. At the time of testing, the same mitigation was reportedly not observed in claude.ai.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

12 LINKEDOpen in app

Malware

1 linked

OpenClaw

Affected products

2 linked

Claude CodeGithub Copilot

Organizations

9 linked

Cisco SystemsSimula Research LaboratoryAnthropicOpenaiGitHubxAIGoogleAntigravityWuzzi

SOURCE COVERAGE

Sources

2 references tracked. Mallory keeps watching after this page renders.

2 SOURCESView all

ScworldNews

Feb 12, 2026

AI Vulnerability Hunting - PSW #913 | SC Media

scworld.com

Open source

Embrace The Red BlogNews

Feb 11, 2026

Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them · Embrace The Red

embracethered.com

Open source

ON THE SAME THREAD

Security researchers warned that **AI agents and retrieval-augmented generation (RAG) systems** can be turned into data-exfiltration channels when attackers poison inputs or embed malicious instructions in content the model is expected to process. One report described a **0-click indirect prompt injection** against *OpenClaw* agents in which hidden instructions cause the agent to generate an attacker-controlled URL containing sensitive data such as API keys or private conversations in query parameters; messaging platforms like *Telegram* or *Discord* can then automatically request that URL for link previews, silently delivering the data to the attacker. The same reporting noted concerns about insecure defaults that allow agents to browse, execute tasks, and access local files, expanding the blast radius of prompt-injection abuse. Related analysis highlighted that the same core weakness extends beyond standalone agents to **enterprise RAG deployments**, where the integrity of the knowledge base becomes part of the security boundary. If attackers can poison indexed documents in systems such as SharePoint or Confluence, they can manipulate retrieval results and influence model outputs, including security workflows and analyst guidance. Broader commentary on **agentic AI threat convergence** reinforced that prompt engineering is no longer just a productivity technique but an emerging exploit class, with adversaries using prompt injection and context manipulation against AI-enabled security operations. Together, the reporting shows that enterprise AI risk increasingly depends on controlling untrusted content, hardening agent permissions, and treating prompts, retrieved documents, and downstream integrations as attack surfaces.

May 7, 2026

SymJack and Malicious AI Skills Expose New Supply-Chain Risks for Coding Agents

Researchers disclosed **SymJack AI**, an attack technique that abuses symbolic links in repositories to trick AI coding assistants and automated development tools into writing to attacker-chosen destinations. A user may approve what appears to be a harmless file change, while the operating system follows a malicious symlink to alter configuration files and enable later command execution with the victim’s privileges. The technique raises concerns for local developer workstations and for CI/CD systems that automatically process untrusted pull requests, where compromised jobs could expose cloud credentials, deploy keys, and other sensitive secrets. Separate research found that public **AI skill marketplaces** remain vulnerable to simple malicious submissions that evade automated scanning. Trail of Bits reported bypassing multiple skill scanners with proof-of-concept packages that used oversized files, hidden content in `.docx` archives, poisoned `.pyc` bytecode, and prompt-injection-style social engineering to disguise malicious behavior. The findings indicate that both repository-based coding agents and downloadable agent skills can become supply-chain entry points, prompting calls for stricter validation of resolved file paths, tighter controls on configuration writes, monitoring of runtime behavior, review of pull requests that modify agent setup, and the use of curated internal skill sources instead of public marketplaces for sensitive environments.

Jun 4, 2026

Prompt Injection Attacks Abuse AI Agent Memory and Link Previews for Manipulation and Data Exfiltration

Security researchers reported multiple **prompt-injection-driven attack paths** that exploit how AI assistants and agentic systems process untrusted content. Microsoft researchers described **AI recommendation/memory poisoning** (mapped in MITRE ATLAS as **`AML.T0080: Memory Poisoning`**) in which attackers insert instructions that cause an assistant to persistently “remember” certain companies, sites, or services as trusted or preferred, shaping future recommendations in later, unrelated conversations. Observed activity over a 60-day period included **50 distinct prompt samples** tied to **31 organizations across 14 industries**, with potential downstream impact in high-stakes domains like health, finance, and security where manipulated recommendations can mislead users without obvious signs of tampering. A separate finding highlighted how **AI agents embedded in messaging apps** can be coerced into leaking secrets via **malicious link previews**. PromptArmor demonstrated that an attacker can use chat-based prompt injection to trick an AI agent into generating an attacker-controlled URL that includes sensitive data (e.g., API keys) as parameters; when messaging platforms (e.g., Slack/Telegram) automatically fetch **link preview** metadata, the preview request can become a **zero-click exfiltration channel**—no user needs to click the link for the data-bearing request to be sent. Together, the reports underscore that agent features intended to improve usability—*persistent memory*, URL-based prompt prepopulation (e.g., “Summarize with AI” buttons), and automatic preview fetching—can be repurposed into scalable manipulation and data-loss mechanisms when untrusted prompts are processed implicitly.

Mar 21, 2026

Hidden Prompt-Injection and Supply-Chain Backdoors in AI Agent Skills

Get ahead of threats like this

How this story unfolded

Cisco documents malicious 'What Would Elon Do?' skill in repository

Simula researchers find hidden prompt injections in Moltbook posts

Researcher scans OpenClawHub and OpenAI Skills for hidden Unicode

Author demonstrates hidden Unicode backdoor in AI Skills

Claude Code reportedly begins blocking invisible Unicode tag attacks

Related entities

Sources

AI Vulnerability Hunting - PSW #913 | SC Media

Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them · Embrace The Red

See the full picture, correlated to your attack surface.

Hidden Prompt-Injection and Supply-Chain Backdoors in AI Agent Skills

Get ahead of threats like this

How this story unfolded

Cisco documents malicious 'What Would Elon Do?' skill in repository

Simula researchers find hidden prompt injections in Moltbook posts

Researcher scans OpenClawHub and OpenAI Skills for hidden Unicode

Author demonstrates hidden Unicode backdoor in AI Skills

Claude Code reportedly begins blocking invisible Unicode tag attacks

Related entities

Sources

AI Vulnerability Hunting - PSW #913 | SC Media

Scary Agent Skills: Hidden Unicode Instructions in Skills ...And How To Catch Them · Embrace The Red

See the full picture, correlated to your attack surface.

Related stories

Indirect Prompt Injection and Data Exfiltration Risks in Enterprise AI Agents

SymJack and Malicious AI Skills Expose New Supply-Chain Risks for Coding Agents

Prompt Injection Attacks Abuse AI Agent Memory and Link Previews for Manipulation and Data Exfiltration