Skip to main content
Live Webinar with SANS (June 25)— Agentic CTI Automation for Fun & ProfitRegister Free
Mallory
Back to intelligence
ai-platform-securitydata-exfiltration-methodcredential-access-method

Prompt Injection and Jailbreak Attacks on Large Language Models

Updated 2mo agoFirst seen Dec 2, 20252 sources

Recent research has demonstrated that large language models (LLMs) such as GPT-5 and others are increasingly vulnerable to prompt injection and jailbreak attacks, which can be exploited to bypass built-in safety guardrails and leak sensitive information. Attackers use techniques like prompt injection—embedding malicious instructions within seemingly benign queries—to trick LLMs into revealing confidential data, including user credentials and internal documents. A notable study by Icaro Lab, in collaboration with Sapienza University and DEXAI, found that adversarial prompts written as poetry could successfully bypass safety mechanisms in 62% of tested cases across 25 frontier models, with some models exceeding a 90% success rate. These findings highlight the sophistication and creativity of new attack vectors targeting AI systems, raising significant concerns for organizations embedding LLMs into business operations.

The widespread adoption of LLMs in handling sensitive business functions amplifies the risk of data exfiltration through these advanced attack methods. As organizations increasingly rely on AI for customer service, document processing, and other critical tasks, the potential for prompt injection and poetic jailbreaks to facilitate unauthorized data access becomes a pressing security issue. The research underscores the urgent need for improved AI safety measures, robust prompt filtering, and continuous monitoring to mitigate the risks posed by these evolving adversarial techniques.

Share:
Prompt Injection and Jailbreak Attacks on Large Language Models
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

1 event from the most recent confirmed update back to the earliest known activity.

1 EVENTS
Dec 2, 20257mo ago

Story first reported

Initial story creation

The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.

Prompt Injection and Jailbreak Attacks on Large Language Models | Mallory