Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks
Cisco reported that multi-turn jailbreak techniques—iterative, conversational prompt sequences designed to erode safety guardrails—successfully bypassed protections in eight major open-weight large language models 92.78% of the time, while single-turn prompt attempts were notably less effective. The findings, published in Cisco’s State of AI Security research and covered by multiple outlets, highlight that many enterprise AI deployments using downloadable, self-hosted models may be more vulnerable to sustained adversarial prompting than organizations assume.
The report’s risk framing is amplified by broader concerns that model misuse and capability leakage can scale quickly: Anthropic separately alleged coordinated model distillation activity by Chinese AI labs using large volumes of fraudulent accounts and proxy infrastructure to extract advanced behaviors from Claude, warning that copied models may lack comparable safety controls and could be repurposed for malicious use. Related research coverage also notes that LLMs can sometimes be induced—via specialized prompting/jailbreaking methods—to reproduce near-verbatim copyrighted text from training data, underscoring that prompt-based attacks can drive both policy bypass and data/content extraction outcomes, particularly when guardrails are tested over extended interactions.

Get ahead of threats like this
Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.
How this story unfolded
3 events from the most recent confirmed update back to the earliest known activity.
Cisco discloses alleged nation-state use of a jailbroken AI coding tool
In the same report, Cisco described what it said was the first publicly disclosed case of a nation-state actor repurposing an AI coding assistant for cyberespionage. Cisco alleged that the Chinese state-backed group GTG-1002 used a jailbroken coding tool to automate most of an intrusion or attack chain.
Cisco tests eight open-weight LLMs for jailbreak resistance
Cisco's State of AI Security report evaluated eight major open-weight large language models in a black-box setup to measure how well their safety guardrails resisted jailbreak attempts. The testing found that multi-turn jailbreak attacks succeeded 92.78% of the time, far more often than single-turn attacks, indicating a systemic weakness in current safety approaches.
Cisco highlights 2025 exploits targeting agent and tool-connection layers
Cisco's report cited multiple real-world exploits from 2025 involving AI agent ecosystems, including tool poisoning, remote code execution via malicious tool servers, and agent supply-chain compromise. The company warned that excessive agency and insecure tool-connection infrastructure such as MCP expand the attack surface beyond text generation.
Related entities
Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.
Sources
2 references tracked. Mallory keeps watching after this page renders.
See the full picture, correlated to your attack surface.
Map indicators from this story to your assets and identify affected systems in minutes.
Every observed campaign, victim, and pivot linked to actors named in this story.
Malware, exploits, and IOCs connected to the activity described here.
YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.
Get matching new stories delivered to your team as they break — not the next morning.
Ask questions about this story and take action on the answers.


