Skip to main content
Live Webinar with SANS (June 25)— Agentic CTI Automation for Fun & ProfitRegister Free
Mallory
Back to intelligence
ai-platform-securityai-enabled-threat-activity

Large Language Model Jailbreaks via Adversarial Poetry

Updated 3mo agoFirst seen Nov 28, 20252 sources

Researchers have discovered that phrasing prompts as poetry can effectively bypass safety mechanisms in large language models (LLMs), enabling users to elicit harmful or restricted outputs. In a recent study, adversarial poetic prompts were tested across 25 proprietary and open-weight LLMs, including those from major providers such as OpenAI, Meta, and Anthropic. The poetic approach achieved an average jailbreak success rate of 62% for hand-crafted poems and 43% for meta-prompt conversions, significantly outperforming non-poetic baselines. The technique proved effective across a range of sensitive topics, including instructions for creating nuclear weapons, malware, and other high-risk content, highlighting a systematic vulnerability in current AI safety and alignment protocols.

The research involved converting over a thousand known harmful prompts into verse using a standardized meta-prompt, then evaluating the models' responses with both automated and human-labeled safety assessments. The findings suggest that stylistic variations, such as poetic framing, can systematically circumvent existing guardrails, raising concerns about the robustness of current LLM safety measures. The researchers have notified major AI vendors of their results, but have withheld specific prompt examples for security reasons. This vulnerability underscores the need for more resilient alignment strategies and evaluation methods in AI safety engineering.

Share:
Large Language Model Jailbreaks via Adversarial Poetry
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS
Nov 28, 20257mo ago

Study on poetry-based prompt injection is publicly reported

Wired and Schneier on Security publicly reported the findings of the study, highlighting that stylistic variations such as poetry can evade existing AI safety filters. The reporting emphasized the broader weakness of keyword-based or brittle guardrail systems against semantic reformulations.

Researchers notify affected AI companies of the poetry jailbreak issue

After identifying the vulnerability, the researchers informed the affected AI companies about the guardrail bypass technique. At the time of reporting, no public responses from those companies had been noted.

Researchers demonstrate poetry-based jailbreaks against major AI chatbots

A study by Icaro Lab researchers from Sapienza University of Rome and the DexAI think tank found that prompts written as poems could bypass safety guardrails in large language models from vendors including OpenAI, Meta, and Anthropic. The research reported a 62% success rate for handcrafted poetic jailbreaks, reaching as high as 90% on some models, including for highly dangerous requests such as nuclear weapon guidance.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

7 LINKEDOpen in app
Organizations
7 linked
AnthropicMeta PlatformsOpenaiIntelDexaiIcaro LabSapienza University in Rome
The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.