Prompt Injection and Jailbreak Techniques Targeting LLM-Powered Applications
Security researchers and vendors are warning that prompt injection and jailbreak techniques remain a leading risk for enterprise deployments of large language models (LLMs), enabling attackers to override system instructions, bypass safety controls, and potentially drive data exposure outcomes. Resecurity reports assisting a Fortune 100 organization where AI-powered banking and HR applications were targeted with prompt-injection attempts, emphasizing that these attacks exploit model behavior rather than traditional software flaws and can be used in scenarios such as extracting sensitive configuration data (for example, attempts to elicit content resembling /etc/passwd). Resecurity also cites OWASP’s 2025 Top 10 for LLM Applications, where prompt injection is ranked as the top issue, and frames continuous security testing (e.g., VAPT) as a key control for enterprise AI systems.
Separate research highlighted by Kaspersky describes a “poetry” jailbreak technique in which prompts framed as rhyming verse increased the likelihood that chatbots would produce disallowed or unsafe responses; the study tested this approach across 25 models from multiple vendors (including Anthropic, OpenAI, Google, Meta, DeepSeek, and xAI). In contrast, OpenAI’s planned upgrade to ChatGPT Temporary Chat is primarily a product/privacy change—adding optional personalization while keeping temporary chats out of history and model training (with possible retention for up to 30 days)—and does not describe a specific security incident or vulnerability disclosure tied to prompt injection or jailbreak research.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
48 events from the most recent confirmed update back to the earliest known activity.
Zenity reports theft of Microsoft Copilot system prompt
Zenity Labs published research describing how Microsoft Copilot's hidden system prompt could be extracted, highlighting a concrete prompt-injection-style disclosure affecting a deployed AI assistant. The report added a specific public example of system-prompt exposure in a real-world Copilot product.
Meta proposes 'Agents Rule of Two' for AI agent security
Meta published guidance describing prompt injection as a fundamental unresolved weakness in LLM agents and introduced the 'Agents Rule of Two' security model. The framework says an agent session should satisfy no more than two of three properties—handling untrusted input, accessing sensitive data or systems, and taking external actions—and recommends human supervision when all three are needed.
A3S-Bench paper exposes evasion weaknesses in autonomous agents
A research paper introduced a three-part evasion framework—temporal, spatial, and semantic evasion—to test stateful LLM-based autonomous agents with deep system privileges. Using the new A3S-Bench dataset of 2,254 real-world agent execution trajectories across 20 threat scenarios and 10 LLM backbones, the study reported average risk trigger rates rising from 28.3% to 52.6%, highlighting systemic architectural weaknesses.
LinkedIn profile prompt injection manipulates recruiter AI outreach
A software developer using the name tmuxvim embedded prompt-injection text in a LinkedIn profile instructing any AI reader to address them as 'My Lord' and write in Old English. At least one recruiter message reportedly followed those instructions, demonstrating prompt-injection risk in AI-assisted recruiting workflows that ingest untrusted profile content.
PortSwigger lab shows indirect prompt injection leading to stored XSS
A walkthrough of a PortSwigger Web Security Academy lab demonstrated that indirect prompt injection in an LLM-powered live chat application can cause the model to emit malicious content that is rendered unsafely, resulting in stored cross-site scripting. The attack chain was shown to bypass AI safety filters and could be used to automatically delete another user’s account, highlighting insecure handling of AI-generated output.
OSINT Team blog tests five prompt-injection defenses and finds four fail
An OSINT Team blog post described testing five common prompt-injection defenses across enterprise-style LLM environments including document assistants, coding agents, browser agents, customer support systems, and workflow automations. The evaluation found stronger system prompts, keyword filtering, pattern matching, and classifier layers were unreliable, while context segmentation and permission boundaries with capability isolation materially improved resilience.
Capital One proposes adaptive automated LLM red-teaming framework
Researchers from Capital One’s AI Foundations group introduced Adaptive Instruction Composition, an automated jailbreak testing framework that uses a contextual bandit to learn effective query-and-tactic combinations instead of relying on random combinations. In simulations against Mistral-7B and Llama models, the method reportedly more than doubled WildTeaming’s attack success rate and showed cross-model transferability of learned jailbreak strategies.
Google studies prompt injections on the public web using Common Crawl
Google Security disclosed a threat-intelligence study examining whether indirect prompt injection is being operationalized on the public web by scanning Common Crawl data for known patterns. The company said large-scale detection is difficult because many apparent prompt injections are false positives appearing in benign contexts such as research and educational content.
Trend Micro discloses 'sockpuppeting' jailbreak affecting 11 AI models
Trend Micro detailed a new black-box jailbreak technique called 'sockpuppeting' that abuses assistant-prefill support to inject a fake compliant response and bypass safety guardrails in 11 major LLMs. The researchers reported impacts including generation of malicious exploit code and disclosure of system prompts, and said API-level blocking of assistant prefills is the strongest defense.
OWASP ASI01 guide ranks Agent Goal Hijack as top agentic AI risk
An Adversa technical guide for the 2026 OWASP Agentic Security Initiative described ASI01, Agent Goal Hijack, as the highest-ranked risk for agentic AI systems because attackers can redirect agent objectives through untrusted inputs, tool outputs, inter-agent messages, or configuration tampering. The guide outlined five trust-boundary attack vectors, cited examples including EchoLeak and a GitHub Copilot CVE, and recommended layered mitigations such as strict trust boundaries, least privilege, human approval for sensitive actions, and intent-preserving architectures.
NDSS paper proposes attention-based defense for indirect prompt injection
An NDSS Symposium paper titled 'Attention is All You Need to Defend Against Indirect Prompt Injection Attacks in LLMs' was published, presenting a defense approach aimed specifically at mitigating indirect prompt injection attacks. Based on the title, the work focuses on using attention-related mechanisms as a technical mitigation distinct from previously tracked prompt-injection research.
ArXiv paper analyzes AI-agent indirect prompt injection via public competition
An arXiv paper titled 'How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition' was published, presenting a distinct research study on AI agents' susceptibility to indirect prompt injection. Based on the title, the work derives findings from a large-scale public competition, making it separate from previously tracked defense papers and observational studies.
ArXiv paper proposes causal-attribution defense for indirect prompt injection
An arXiv paper titled 'AttriGuard: Defeating Indirect Prompt Injection in LLM Agents via Causal Attribution of Tool Invocations' was published, presenting a defense approach for LLM agents against indirect prompt injection. Based on the title, the work focuses on using causal attribution of tool invocations to identify or block malicious influence, making it distinct from previously tracked parsing-based and attention-based defenses.
Cloudflare identifies indirect prompt code injection in malicious Workers
Cloudforce One reported that in March 2026 it found malicious or abusive Cloudflare Workers containing multilingual commented 'Notice to AI' lures designed to manipulate AI-based security auditing systems into classifying harmful code as benign. In a later study across 100 confirmed malicious Workers and seven LLMs, Cloudflare found evasion depended more on file structure, comment density, and size than on the deceptive wording alone, and recommended mitigations such as comment stripping and prioritizing functional code.
GitHub report discloses Chain-of-Logic Injection jailbreak as CVE-2026-3098
A GitHub repository/report titled 'LLM Jailbreak via Chain-of-Logic Injection' was published and associated the technique with CVE-2026-3098. The reference indicates a distinct jailbreak disclosure centered on a named attack method and CVE tracking, separate from previously noted prompt-injection and jailbreak research.
OpenReview paper proposes nullspace steering for controlled model subversion
An OpenReview paper titled 'Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion' was published, presenting a distinct jailbreak technique based on nullspace steering. Based on the title, the work appears to focus on controlled manipulation of model behavior as a new technical approach to model subversion.
Resecurity details prompt-injection risks and simulated data disclosure
Resecurity published an analysis describing prompt injection as a leading security risk for enterprise AI applications, outlining direct and indirect injection techniques and a scenario in which an AI HR assistant is manipulated into disclosing a simulated /etc/passwd file. The article also recommended mitigations such as least-privilege tool access, input and output validation, segregation of untrusted content, and continuous adversarial testing.
ArXiv paper analyzes prompt injection in agentic coding assistants
An arXiv paper titled 'Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems' was published, presenting a distinct research effort focused on how prompt injection affects agentic coding assistants. Based on the title, the work examines vulnerabilities spanning assistant skills, integrated tools, and protocol ecosystems rather than general-purpose prompt injection alone.
Study finds poetic prompts can jailbreak major LLMs
Researchers tested rhyming versions of malicious prompts from the MLCommons AILuminate Benchmark against 25 popular models and found that poetry significantly increased the likelihood of unsafe responses. Using a hand-picked set of 20 effective poetic prompts, they reported an average attack success rate of about 62%, with some models such as Gemini 1.5 Pro reportedly bypassed consistently under that metric.
ArXiv paper proposes tool-result parsing defense for indirect prompt injection
An arXiv paper titled 'Defense Against Indirect Prompt Injection via Tool Result Parsing' was published, presenting a mitigation approach focused on parsing tool outputs to reduce indirect prompt injection risk. Based on the title, the work targets attacks delivered through external tool results rather than broader prompt-level or attention-based defenses.
Paper proposes prompt-injection defense via data synthesis and CoT learning
An arXiv paper titled 'Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning' was published, presenting a research approach to harden LLMs against prompt injection. The work appears to focus on defensive training and evaluation using synthesized diverse data and instruction-level chain-of-thought learning.
UK NCSC issues warning on growing AI prompt injection risks
The UK National Cyber Security Centre issued a public warning highlighting prompt injection as a growing security risk in AI systems. The advisory appears to represent an official government cybersecurity response, emphasizing the threat posed by untrusted inputs manipulating LLM behavior.
ArXiv paper studies and defends against prompt injection in AI browser agents
An arXiv paper titled 'BrowseSafe: Understanding and Preventing Prompt Injection Within AI Browser Agents' was published, presenting research focused specifically on how prompt injection affects AI browser agents and how to mitigate it. Based on the title, the work combines threat analysis with defensive techniques tailored to browser-based agent workflows.
Zenity describes Data-Structure Injection in AI agents
Zenity Labs published research on a prompt-injection-related attack class called Data-Structure Injection (DSI) affecting AI agents. The work appears to define or characterize a distinct technique involving malicious manipulation of structured data consumed by agents, expanding the taxonomy of agent injection risks beyond previously tracked general prompt-injection studies.
Paper publishes 'The Attacker Moves Second' on prompt injection
A paper titled 'The Attacker Moves Second' was published as part of a set of newly noted prompt-injection research. Based on the reference title, it represents a distinct research contribution separate from Meta's already-tracked 'Agents Rule of Two' guidance.
Paper proposes task-centric access control against instruction injection
An arXiv paper titled 'Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control' was published, presenting a defense approach for LLM agents against instruction injection. Based on the title, the work focuses on limiting agent power through task-centric access-control mechanisms rather than relying only on prompt-level safeguards.
OpenPromptInjection benchmark toolkit published on GitHub
A GitHub repository called OpenPromptInjection was published as an open-source toolkit for implementing, evaluating, and extending prompt injection attacks, defenses, and LLM-integrated applications. The project included example attack-evaluation workflows and documented two defensive components, DataSentinel for detection and PromptLocate for localization and recovery of injected prompts.
Study shows adaptive attacks bypass 12 LLM jailbreak defenses
A 2025 research paper argued that common evaluations of LLM jailbreak and prompt-injection defenses are inadequate because they rely on static or weak attacks rather than adaptive adversaries. Using tuned optimization methods including gradient descent, reinforcement learning, random search, and human-guided exploration, the researchers reported bypassing 12 recent defenses, with attack success rates above 90% for most of them despite prior near-zero claims.
ArXiv paper studies large reasoning models as autonomous jailbreak agents
An arXiv paper titled 'Large Reasoning Models Are Autonomous Jailbreak Agents' was published, presenting a distinct research contribution on how large reasoning models may autonomously develop or execute jailbreak behavior. Based on the title, the work focuses on reasoning-capable models as active jailbreak agents rather than on a specific defense, benchmark, or previously tracked attack technique.
Research paper proposes secure design patterns for LLM agents
A June 2025 research paper examined prompt injection as a major risk for LLM-powered agents, especially those with tool access or sensitive data exposure. The authors proposed principled design patterns intended to provide provable resistance to prompt injection and analyzed their security-versus-utility trade-offs through case studies.
ArXiv paper presents practical 'Promptware' attacks on production assistants
An arXiv paper titled 'Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous' was published, presenting research on promptware attacks against production LLM-powered assistants. Based on the title, the work argues these attacks are practically exploitable in real deployed assistant environments and represents a distinct contribution from previously tracked prompt-injection and agent-tool abuse studies.
CyberArk describes jailbreak discovery using AI explainability
CyberArk published research on using AI explainability techniques to uncover new jailbreak methods against large language models. The work appears to present a distinct technical approach for identifying or crafting jailbreaks, separate from later studies on adaptive attacks and automated red-teaming.
Simon Willison highlights CaMeL as a new prompt-injection mitigation approach
Simon Willison published a post discussing CaMeL as a promising new direction for mitigating prompt injection attacks. The reference indicates a distinct public discussion of a specific defense approach not yet represented in the timeline.
Black Hat Europe talk presents 'SpAIware' prompt injection exploits
At Black Hat Europe 2024, Johann Rehberger presented 'SpAIware: Advanced Prompt Injection Exploits in AI Assistants,' publicly detailing advanced prompt-injection attack techniques against AI assistants and agent-like workflows. The talk helped formalize and publicize exploit chains showing how untrusted content can manipulate LLM-connected tools and actions.
ArXiv paper analyzes prompt injection across diverse LLM architectures
An arXiv paper titled 'Systematically Analyzing Prompt Injection Vulnerabilities in Diverse LLM Architectures' was published, presenting a distinct research effort focused on evaluating prompt injection weaknesses across different LLM architectures. Based on the title, the work appears to contribute comparative technical analysis rather than general commentary or mitigation guidance alone.
ArXiv paper studies improper tool use attacks on LLM agents
An arXiv paper titled 'Imprompter: Tricking LLM Agents into Improper Tool Use' was published, presenting research on how LLM agents can be manipulated into misusing their tools. Based on the title, the work focuses on agent-specific prompt-injection or jailbreak behavior involving improper tool invocation rather than general LLM prompt injection alone.
WIRED reports 'Imprompter' attack extracting personal details via LLM agents
WIRED reported research showing that a prompt-based attack dubbed 'Imprompter' could manipulate AI chatbots and agents into identifying and extracting personal details from user chats. The coverage highlighted prompt-injection risks tied to tool use and sensitive-data access in LLM-powered assistants.
Slack AI data exfiltration via indirect prompt injection discussed
A Hacker News reference highlighted data exfiltration from Slack AI through an indirect prompt injection technique. This appears to be a distinct public disclosure/example of prompt injection impacting an enterprise AI assistant workflow.
InjecAgent GitHub project published for agent prompt-injection research
The UIUC Kang Lab published the InjecAgent GitHub repository, indicating a distinct public release related to prompt-injection attacks or evaluation in LLM agents. Based on the project name and timing, it represents a separate research artifact focused on agent-specific injection risks not already reflected in the timeline.
Schneier frames prompt injection as LLM data/control-path insecurity
Bruce Schneier published an article arguing that prompt injection in LLMs is a modern instance of the classic security failure of mixing data and control on the same channel. The piece warned that general-purpose LLMs operating on untrusted content are inherently hard to secure against this class of attack and suggested narrower AI systems may be safer in adversarial settings.
ArXiv paper proposes instruction hierarchy training against prompt injection
An arXiv paper titled 'The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions' was published, presenting a training-based approach to make language models follow higher-priority trusted instructions over lower-priority untrusted ones. Based on the title, the work represents an early defensive research contribution aimed at reducing prompt-injection susceptibility through instruction prioritization.
ArXiv paper proposes spotlighting defense for indirect prompt injection
An arXiv paper titled 'Defending Against Indirect Prompt Injection Attacks With Spotlighting' was published, presenting a defense approach aimed specifically at mitigating indirect prompt injection attacks. Based on the title, the work focuses on a distinct 'spotlighting' mitigation technique separate from later parsing-based and attention-based defenses.
ArXiv paper proposes StruQ structured-query defense against prompt injection
An arXiv paper titled 'StruQ: Defending Against Prompt Injection with Structured Queries' was published, presenting a defense that separates trusted prompts from untrusted user data into distinct channels. The approach combines a secure front end with a specially trained model fine-tuned to ignore instructions embedded in the data channel while preserving utility and output quality.
Simon Willison reports markdown-image prompt injection against ChatGPT web
Simon Willison documented a prompt injection attack against the ChatGPT web interface in which markdown images could be used to exfiltrate chat data. The disclosure highlighted an early practical example of prompt injection leading to data theft in a deployed consumer LLM product.
Prompt injection attack causes Bing Chat to reveal hidden instructions
Ars Technica reported that Microsoft’s AI-powered Bing Chat could be manipulated via prompt injection to disclose parts of its hidden system prompt and internal operating rules. The incident became an early high-profile public example of prompt injection affecting a deployed consumer LLM product.
ArXiv paper introduces PromptInject attack framework for language models
An arXiv paper titled 'Ignore Previous Prompt: Attack Techniques For Language Models' was published, examining how malicious inputs can misalign transformer-based language models in production-like settings. The authors introduced the PromptInject framework and studied goal hijacking and prompt leaking as distinct attack classes against customer-facing LLM applications.
Simon Willison launches prompt injection series on LLM security
Simon Willison published a series of posts focused on prompt injection as a security vulnerability in software built on top of large language models. The series argued the problem was largely unsolved, distinguished it from jailbreaking, and emphasized reducing blast radius and constraining tool access as the safest practical mitigation approach.
Crescendo multi-turn jailbreak technique publicly released
A public project page for 'Crescendo' described a distinct multi-turn jailbreak technique for large language models. The reference indicates a named attack method focused on iterative or conversational jailbreak behavior, separate from previously tracked single-technique jailbreak and prompt-injection studies.
Related entities
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
Sources
50 references tracked. Mallory keeps watching after this page renders.
OWASP LLM01 in 2026: I Tested the Top 5 Defenses, 4 Failed | by Aeon Flex, Elriel Assoc. 2133 [NEON MAXIMA] | May, 2026 | OSINT Team
osintteam.blog
Open source[2605.22321] Benchmarking Autonomous Agents against Temporal, Spatial, and Semantic Evasions
arxiv.org
Open sourceLinkedIn recruitment spam becomes Olde English prose after user hides AI prompt injection in bio - bots also also manipulated to address user as ‘My Lord’ | Tom's Hardware
tomshardware.com
Open source🚨 Exploiting Insecure Output Handling in LLMs via Indirect Prompt Injection (XSS) | by Mukilan Baskaran | May, 2026 | InfoSec Write-ups
infosecwriteups.com
Open sourceAgents Rule of Two: A Practical Approach to AI Agent Security
ai.meta.com
Open sourceStealing Copilot's System Prompt
labs.zenity.io
Open sourceI Blackhat Archive
i.blackhat.com
Open sourceCrescendo
crescendo-the-multiturn-jailbreak.github.io
Open sourceSee the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


