Skip to main content
Meet us at Black Hat USA 2026— Las Vegas, August 1–6Book a Meeting
Mallory
Back to intelligence
ai-platform-securityai-enabled-threat-activitydata-exfiltration-method

Hidden Instruction Attacks Against AI Assistants and Coding Agents

Updated 3d agoFirst seen Mar 17, 20264 sources

New research shows that AI assistants and coding agents can be manipulated by hidden or misleading instructions embedded in content they process, creating a practical path to social engineering, data leakage, and potentially malicious command execution. LayerX reported that a custom font and CSS-based rendering attack can make a webpage appear harmless in the DOM while showing different instructions to the user in the browser, exploiting the gap between what AI systems analyze and what humans actually see. In its proof of concept, a page that looked like benign fanfiction at the HTML level rendered instructions that could steer a user toward executing a reverse shell, and the company said major AI assistants it tested failed to detect the threat.

Separate research highlighted a related semantic injection risk in software repository README files used by AI coding agents. In testing, attackers embedded malicious setup-style instructions into documentation so agents would exfiltrate local files or other sensitive data to external servers during project setup, with reported success rates of up to 85%. The benchmark, ReadSecBench, used 500 open source repositories across multiple programming languages and found the attack worked across agents using models from Anthropic, OpenAI, and Google, indicating that hidden-instruction attacks are not limited to one interface or vendor but reflect a broader weakness in how AI systems interpret untrusted content and operational guidance.

Share:
Hidden Instruction Attacks Against AI Assistants and Coding Agents
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

4 events from the most recent confirmed update back to the earliest known activity.

4 EVENTS
Mar 17, 20264mo ago

LayerX publicly reveals poisoned typeface attack details

On March 17, 2026, LayerX publicly described the 'Poisoned Typeface' technique, explaining how normal browser font rendering can hide malicious instructions from AI assistants that rely on text-only DOM analysis. The disclosure included recommendations such as render-and-diff analysis, hidden-content detection, and font inspection to reduce AI-assisted social engineering risk.

Researchers document README-based semantic injection attack

Researchers published findings on a semantic injection attack in which malicious instructions hidden in repository README files or linked documentation caused AI coding agents to exfiltrate sensitive local data. Using the 500-file ReadSecBench benchmark, they found the attack succeeded in up to 85% of cases and about 91% when the malicious content was placed two hops away in linked documentation, affecting major model families including Claude, GPT, and Gemini.

Dec 17, 20257mo ago

LayerX discloses font-rendering issue to major AI vendors

After developing the proof of concept, LayerX reported the font-rendering weakness to major AI vendors under a 90-day disclosure process. According to the report, Microsoft was the only vendor that fully engaged and addressed the issue, while others treated it as out of scope or as social engineering.

Dec 1, 20257mo ago

LayerX tests custom-font attack against AI web assistants

In December 2025, LayerX conducted a proof of concept showing that custom fonts and CSS could make a webpage appear benign in the DOM while rendering malicious instructions to users. The company found that multiple non-agentic AI assistants, including ChatGPT, Claude, Copilot, Gemini, Grok, and Perplexity, failed to detect the threat and often said the page was safe.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

11 LINKEDOpen in app
Affected products
3 linked
ChatgptChatgptCopilot
Organizations
8 linked
AnthropicOpenaiPerplexityMicrosoft CorporationxAIGoogleKnowbe4LayerX
The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.