Large Language Model Jailbreaks via Adversarial Poetry

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS

Nov 28, 20257mo ago

Study on poetry-based prompt injection is publicly reported

Wired and Schneier on Security publicly reported the findings of the study, highlighting that stylistic variations such as poetry can evade existing AI safety filters. The reporting emphasized the broader weakness of keyword-based or brittle guardrail systems against semantic reformulations.

Researchers notify affected AI companies of the poetry jailbreak issue

After identifying the vulnerability, the researchers informed the affected AI companies about the guardrail bypass technique. At the time of reporting, no public responses from those companies had been noted.

Researchers demonstrate poetry-based jailbreaks against major AI chatbots

A study by Icaro Lab researchers from Sapienza University of Rome and the DexAI think tank found that prompts written as poems could bypass safety guardrails in large language models from vendors including OpenAI, Meta, and Anthropic. The research reported a 62% success rate for handcrafted poetic jailbreaks, reaching as high as 90% on some models, including for highly dangerous requests such as nuclear weapon guidance.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

7 LINKEDOpen in app

Organizations

7 linked

AnthropicMeta PlatformsOpenaiIntelDexaiIcaro LabSapienza University in Rome

SOURCE COVERAGE

Sources

2 references tracked. Mallory keeps watching after this page renders.

2 SOURCESView all

Schneier On SecurityNews

Nov 28, 2025

Prompt Injection Through Poetry

schneier.com

Open source

Wired Com SecurityNews

Nov 28, 2025

Poems Can Trick AI Into Helping You Make a Nuclear Weapon

wired.com

Open source

ON THE SAME THREAD

Recent research has demonstrated that large language models (LLMs) such as GPT-5 and others are increasingly vulnerable to prompt injection and jailbreak attacks, which can be exploited to bypass built-in safety guardrails and leak sensitive information. Attackers use techniques like prompt injection—embedding malicious instructions within seemingly benign queries—to trick LLMs into revealing confidential data, including user credentials and internal documents. A notable study by Icaro Lab, in collaboration with Sapienza University and DEXAI, found that adversarial prompts written as poetry could successfully bypass safety mechanisms in 62% of tested cases across 25 frontier models, with some models exceeding a 90% success rate. These findings highlight the sophistication and creativity of new attack vectors targeting AI systems, raising significant concerns for organizations embedding LLMs into business operations. The widespread adoption of LLMs in handling sensitive business functions amplifies the risk of data exfiltration through these advanced attack methods. As organizations increasingly rely on AI for customer service, document processing, and other critical tasks, the potential for prompt injection and poetic jailbreaks to facilitate unauthorized data access becomes a pressing security issue. The research underscores the urgent need for improved AI safety measures, robust prompt filtering, and continuous monitoring to mitigate the risks posed by these evolving adversarial techniques.

Mar 21, 2026

Prompt Injection and Jailbreak Techniques Targeting LLM-Powered Applications

Security researchers and vendors are warning that **prompt injection and jailbreak techniques** remain a leading risk for enterprise deployments of large language models (LLMs), enabling attackers to override system instructions, bypass safety controls, and potentially drive **data exposure** outcomes. Resecurity reports assisting a Fortune 100 organization where AI-powered banking and HR applications were targeted with prompt-injection attempts, emphasizing that these attacks exploit model behavior rather than traditional software flaws and can be used in scenarios such as extracting sensitive configuration data (for example, attempts to elicit content resembling `/etc/passwd`). Resecurity also cites OWASP’s 2025 Top 10 for LLM Applications, where prompt injection is ranked as the top issue, and frames continuous security testing (e.g., VAPT) as a key control for enterprise AI systems. Separate research highlighted by Kaspersky describes a **“poetry” jailbreak** technique in which prompts framed as rhyming verse increased the likelihood that chatbots would produce disallowed or unsafe responses; the study tested this approach across 25 models from multiple vendors (including Anthropic, OpenAI, Google, Meta, DeepSeek, and xAI). In contrast, OpenAI’s planned upgrade to *ChatGPT Temporary Chat* is primarily a product/privacy change—adding optional personalization while keeping temporary chats out of history and model training (with possible retention for up to 30 days)—and does not describe a specific security incident or vulnerability disclosure tied to prompt injection or jailbreak research.

Jun 2, 2026

Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks

Cisco reported that **multi-turn jailbreak** techniques—iterative, conversational prompt sequences designed to erode safety guardrails—successfully bypassed protections in eight major **open-weight** large language models **92.78%** of the time, while single-turn prompt attempts were notably less effective. The findings, published in Cisco’s *State of AI Security* research and covered by multiple outlets, highlight that many enterprise AI deployments using downloadable, self-hosted models may be more vulnerable to sustained adversarial prompting than organizations assume. The report’s risk framing is amplified by broader concerns that model misuse and capability leakage can scale quickly: Anthropic separately alleged coordinated **model distillation** activity by Chinese AI labs using large volumes of fraudulent accounts and proxy infrastructure to extract advanced behaviors from *Claude*, warning that copied models may lack comparable safety controls and could be repurposed for malicious use. Related research coverage also notes that LLMs can sometimes be induced—via specialized prompting/jailbreaking methods—to reproduce near-verbatim copyrighted text from training data, underscoring that prompt-based attacks can drive both **policy bypass** and **data/content extraction** outcomes, particularly when guardrails are tested over extended interactions.

Mar 21, 2026

Large Language Model Jailbreaks via Adversarial Poetry

Get ahead of threats like this

How this story unfolded

Study on poetry-based prompt injection is publicly reported

Researchers notify affected AI companies of the poetry jailbreak issue

Researchers demonstrate poetry-based jailbreaks against major AI chatbots

Related entities

Sources

Prompt Injection Through Poetry

Poems Can Trick AI Into Helping You Make a Nuclear Weapon

See the full picture, correlated to your attack surface.

Large Language Model Jailbreaks via Adversarial Poetry

Get ahead of threats like this

How this story unfolded

Study on poetry-based prompt injection is publicly reported

Researchers notify affected AI companies of the poetry jailbreak issue

Researchers demonstrate poetry-based jailbreaks against major AI chatbots

Related entities

Sources

Prompt Injection Through Poetry

Poems Can Trick AI Into Helping You Make a Nuclear Weapon

See the full picture, correlated to your attack surface.

Related stories

Prompt Injection and Jailbreak Attacks on Large Language Models

Prompt Injection and Jailbreak Techniques Targeting LLM-Powered Applications

Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks