Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS

Feb 23, 20264mo ago

Cisco discloses alleged nation-state use of a jailbroken AI coding tool

In the same report, Cisco described what it said was the first publicly disclosed case of a nation-state actor repurposing an AI coding assistant for cyberespionage. Cisco alleged that the Chinese state-backed group GTG-1002 used a jailbroken coding tool to automate most of an intrusion or attack chain.

Cisco tests eight open-weight LLMs for jailbreak resistance

Cisco's State of AI Security report evaluated eight major open-weight large language models in a black-box setup to measure how well their safety guardrails resisted jailbreak attempts. The testing found that multi-turn jailbreak attacks succeeded 92.78% of the time, far more often than single-turn attacks, indicating a systemic weakness in current safety approaches.

Jan 1, 20251y ago

Cisco highlights 2025 exploits targeting agent and tool-connection layers

Cisco's report cited multiple real-world exploits from 2025 involving AI agent ecosystems, including tool poisoning, remote code execution via malicious tool servers, and agent supply-chain compromise. The company warned that excessive agency and insecure tool-connection infrastructure such as MCP expand the attack surface beyond text generation.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

16 LINKEDOpen in app

Threat actors

1 linked

GTG-1002

Affected products

2 linked

ChatgptChatgpt

Organizations

13 linked

ShutterstockAlibaba CloudHugging FaceCisco SystemsDeepseekMistral AIAnthropicMeta PlatformsOpenaiMicrosoft CorporationGoogleZhipu AIInformation Security Media Group

SOURCE COVERAGE

Sources

2 references tracked. Mallory keeps watching after this page renders.

2 SOURCESView all

GovinfosecurityNews

Feb 23, 2026

Open-Weight AI Models Fail the Jailbreak Test

govinfosecurity.com

Open source

Bank Info SecurityNews

Feb 23, 2026

Open-Weight AI Models Fail the Jailbreak Test

bankinfosecurity.com

Open source

ON THE SAME THREAD

Security researchers and vendors are warning that **prompt injection and jailbreak techniques** remain a leading risk for enterprise deployments of large language models (LLMs), enabling attackers to override system instructions, bypass safety controls, and potentially drive **data exposure** outcomes. Resecurity reports assisting a Fortune 100 organization where AI-powered banking and HR applications were targeted with prompt-injection attempts, emphasizing that these attacks exploit model behavior rather than traditional software flaws and can be used in scenarios such as extracting sensitive configuration data (for example, attempts to elicit content resembling `/etc/passwd`). Resecurity also cites OWASP’s 2025 Top 10 for LLM Applications, where prompt injection is ranked as the top issue, and frames continuous security testing (e.g., VAPT) as a key control for enterprise AI systems. Separate research highlighted by Kaspersky describes a **“poetry” jailbreak** technique in which prompts framed as rhyming verse increased the likelihood that chatbots would produce disallowed or unsafe responses; the study tested this approach across 25 models from multiple vendors (including Anthropic, OpenAI, Google, Meta, DeepSeek, and xAI). In contrast, OpenAI’s planned upgrade to *ChatGPT Temporary Chat* is primarily a product/privacy change—adding optional personalization while keeping temporary chats out of history and model training (with possible retention for up to 30 days)—and does not describe a specific security incident or vulnerability disclosure tied to prompt injection or jailbreak research.

Jun 2, 2026

AI Safety Filters in Large Language Models Fail During Extended Conversations

Researchers have identified that safety filters in large language models (LLMs) can be bypassed during extended, multi-turn conversations, significantly increasing the risk of adversarial prompt success. Cisco's research demonstrated that attack success rates jump from an average of 13% for single prompts to 64% in longer chats, with some models like Meta's Llama 3.3-70B-Instruct and Alibaba’s Qwen3-32B reaching nearly 93% failure rates. These vulnerabilities are attributed to the models' architectural design, which processes dialogue through sliding context windows and does not consistently reapply safety judgments across conversation turns. The most capable and open models are the most susceptible to these failures, while more conservatively aligned models such as Google's Gemma 3-1B-IT show smaller gaps between single- and multi-turn failures. Attackers can exploit these weaknesses by gradually shifting the context or rephrasing requests, eventually eliciting responses that bypass initial safety mechanisms and may include the generation of malicious code. The findings highlight a critical challenge in evaluating and securing LLMs, as traditional one-shot prompt testing fails to capture these multi-turn vulnerabilities.

Mar 21, 2026

Prompt Injection and Jailbreak Attacks on Large Language Models

Recent research has demonstrated that large language models (LLMs) such as GPT-5 and others are increasingly vulnerable to prompt injection and jailbreak attacks, which can be exploited to bypass built-in safety guardrails and leak sensitive information. Attackers use techniques like prompt injection—embedding malicious instructions within seemingly benign queries—to trick LLMs into revealing confidential data, including user credentials and internal documents. A notable study by Icaro Lab, in collaboration with Sapienza University and DEXAI, found that adversarial prompts written as poetry could successfully bypass safety mechanisms in 62% of tested cases across 25 frontier models, with some models exceeding a 90% success rate. These findings highlight the sophistication and creativity of new attack vectors targeting AI systems, raising significant concerns for organizations embedding LLMs into business operations. The widespread adoption of LLMs in handling sensitive business functions amplifies the risk of data exfiltration through these advanced attack methods. As organizations increasingly rely on AI for customer service, document processing, and other critical tasks, the potential for prompt injection and poetic jailbreaks to facilitate unauthorized data access becomes a pressing security issue. The research underscores the urgent need for improved AI safety measures, robust prompt filtering, and continuous monitoring to mitigate the risks posed by these evolving adversarial techniques.

Mar 21, 2026

Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks

Get ahead of threats like this

How this story unfolded

Cisco discloses alleged nation-state use of a jailbroken AI coding tool

Cisco tests eight open-weight LLMs for jailbreak resistance

Cisco highlights 2025 exploits targeting agent and tool-connection layers

Related entities

Sources

Open-Weight AI Models Fail the Jailbreak Test

Open-Weight AI Models Fail the Jailbreak Test

See the full picture, correlated to your attack surface.

Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks

Get ahead of threats like this

How this story unfolded

Cisco discloses alleged nation-state use of a jailbroken AI coding tool

Cisco tests eight open-weight LLMs for jailbreak resistance

Cisco highlights 2025 exploits targeting agent and tool-connection layers

Related entities

Sources

Open-Weight AI Models Fail the Jailbreak Test

Open-Weight AI Models Fail the Jailbreak Test

See the full picture, correlated to your attack surface.

Related stories

Prompt Injection and Jailbreak Techniques Targeting LLM-Powered Applications

AI Safety Filters in Large Language Models Fail During Extended Conversations

Prompt Injection and Jailbreak Attacks on Large Language Models