Skip to main content
Live Webinar with SANS (June 25)— Agentic CTI Automation for Fun & ProfitRegister Free
Mallory
Back to intelligence
ai-platform-securitydefense-evasion-methoddata-exfiltration-methodai-enabled-threat-activity

Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks

Updated 3mo agoFirst seen Feb 23, 20262 sources

Cisco reported that multi-turn jailbreak techniques—iterative, conversational prompt sequences designed to erode safety guardrails—successfully bypassed protections in eight major open-weight large language models 92.78% of the time, while single-turn prompt attempts were notably less effective. The findings, published in Cisco’s State of AI Security research and covered by multiple outlets, highlight that many enterprise AI deployments using downloadable, self-hosted models may be more vulnerable to sustained adversarial prompting than organizations assume.

The report’s risk framing is amplified by broader concerns that model misuse and capability leakage can scale quickly: Anthropic separately alleged coordinated model distillation activity by Chinese AI labs using large volumes of fraudulent accounts and proxy infrastructure to extract advanced behaviors from Claude, warning that copied models may lack comparable safety controls and could be repurposed for malicious use. Related research coverage also notes that LLMs can sometimes be induced—via specialized prompting/jailbreaking methods—to reproduce near-verbatim copyrighted text from training data, underscoring that prompt-based attacks can drive both policy bypass and data/content extraction outcomes, particularly when guardrails are tested over extended interactions.

Share:
Cisco Testing Finds Open-Weight LLMs Highly Susceptible to Multi-Turn Jailbreaks
Stay ahead

Get ahead of threats like this

Mallory correlates global threat intelligence with your attack surface — know if you’re exposed before adversaries strike.

EVENT TIMELINE

How this story unfolded

3 events from the most recent confirmed update back to the earliest known activity.

3 EVENTS
Feb 23, 20264mo ago

Cisco discloses alleged nation-state use of a jailbroken AI coding tool

In the same report, Cisco described what it said was the first publicly disclosed case of a nation-state actor repurposing an AI coding assistant for cyberespionage. Cisco alleged that the Chinese state-backed group GTG-1002 used a jailbroken coding tool to automate most of an intrusion or attack chain.

Cisco tests eight open-weight LLMs for jailbreak resistance

Cisco's State of AI Security report evaluated eight major open-weight large language models in a black-box setup to measure how well their safety guardrails resisted jailbreak attempts. The testing found that multi-turn jailbreak attacks succeeded 92.78% of the time, far more often than single-turn attacks, indicating a systemic weakness in current safety approaches.

Jan 1, 20251y ago

Cisco highlights 2025 exploits targeting agent and tool-connection layers

Cisco's report cited multiple real-world exploits from 2025 involving AI agent ecosystems, including tool poisoning, remote code execution via malicious tool servers, and agent supply-chain compromise. The company warned that excessive agency and insecure tool-connection infrastructure such as MCP expand the attack surface beyond text generation.

LINKED ENTITIES

Related entities

Vulnerabilities, threat actors, malware, products, organizations, and breaches Mallory has linked to this story.

16 LINKEDOpen in app
Threat actors
1 linked
Affected products
2 linked
ChatgptChatgpt
Organizations
13 linked
ShutterstockAlibaba CloudHugging FaceCisco SystemsDeepseekMistral AIAnthropicMeta PlatformsOpenaiMicrosoft CorporationGoogleZhipu AIInformation Security Media Group
The operational view lives in Mallory

See the full picture, correlated to your attack surface.

This page covers what’s public. Mallory adds the parts that aren’t — which of your assets are affected, which threat actors are using it right now, which detections to deploy, and what to do next.
Exposure mapping

Map indicators from this story to your assets and identify affected systems in minutes.

Threat actor evidence

Every observed campaign, victim, and pivot linked to actors named in this story.

Associated malware

Malware, exploits, and IOCs connected to the activity described here.

Detection signatures

YARA, Sigma, and Snort rules deployed to your SIEM as soon as they’re published.

Scheduled alerts

Get matching new stories delivered to your team as they break — not the next morning.

AI threads

Ask questions about this story and take action on the answers.