Hidden Prompt-Injection and Supply-Chain Backdoors in AI Agent Skills
Security researchers are warning that AI agent “Skills” (markdown/YAML instruction packages that extend agent capabilities) are becoming a supply-chain risk due to hidden prompt-injection content that can survive human review. A demonstrated technique uses invisible Unicode Tag codepoints embedded in skill files to smuggle instructions that some models interpret as executable guidance, enabling outcomes such as data exfiltration, prompt injection, and other malicious behavior when the skill is invoked; a basic scanner was also built to help detect these hidden-instruction patterns.
Separate reporting highlighted broader evidence of the same threat pattern across agent ecosystems: Simula Research Laboratory identified hidden prompt-injection attacks in a measurable portion of sampled content on a platform referenced as Moltbook, and Cisco researchers documented a malicious agent skill (“What Would Elon Do?”) that exfiltrated data to external servers while being artificially boosted to appear as a top-ranked skill. Researchers also anticipate the emergence of self-replicating adversarial prompts (“prompt worms/viruses”) that could propagate through networks of communicating AI agents, amplifying the impact of compromised skills and poisoned instruction content.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
5 events from the most recent confirmed update back to the earliest known activity.
Cisco documents malicious 'What Would Elon Do?' skill in repository
Cisco researchers identified a malicious AI skill called 'What Would Elon Do?' that exfiltrated data to external servers. The skill was reportedly ranked No. 1 in a skill repository, with indications its popularity may have been artificially inflated.
Simula researchers find hidden prompt injections in Moltbook posts
Simula Research Laboratory reported that 506 Moltbook posts, representing 2.6% of sampled content, contained hidden prompt-injection attacks. The finding highlighted that prompt-injection content was already being embedded in public AI-related content repositories.
Researcher scans OpenClawHub and OpenAI Skills for hidden Unicode
The author scanned OpenClawHub and OpenAI Skills projects for invisible Unicode codepoints and found some instances, though they were not obviously malicious and were often attributable to emoji handling or test cases. The scan was presented alongside detection guidance and a simple scanner for identifying such hidden content.
Author demonstrates hidden Unicode backdoor in AI Skills
A researcher showed that invisible Unicode Tag characters can be embedded in markdown-based AI Skills to hide prompt-injection instructions that some models interpret. The proof of concept modified a legitimate-looking security Skill so the agent would print a phrase and execute a remote shell command via curl piped to bash.
Claude Code reportedly begins blocking invisible Unicode tag attacks
The researcher reported that Claude Code started detecting or refusing invisible Unicode Tag-based instructions in early February 2026. At the time of testing, the same mitigation was reportedly not observed in claude.ai.
Related entities
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
Sources
2 references tracked. Mallory keeps watching after this page renders.
See the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


