Multi-Turn Attacks Expose Weaknesses in Open-Weight LLM Models

A new report has revealed that open-weight large language models (LLMs) have remained highly vulnerable to adaptive multi-turn adversarial attacks, even when single-turn defenses appear robust.

The findings, published today by Cisco AI Defense, show that while isolated, one-off attack attempts frequently fail, persistent, multi-step conversations can achieve success rates exceeding 90% against most tested defenses.

Multi-Turn Attacks Outperform Single-Turn Tests

Cisco’s analysis compared single-turn and multi-turn testing to measure how models respond under sustained adversarial pressure.

Using over 1000 prompts per model, researchers observed that many models performed well when faced with a single malicious input but quickly deteriorated when attackers refined their strategy over several turns.

Adaptive attack styles, such as “Crescendo,” “Role-Play” and “Refusal Reframe,” allowed malicious actors to manipulate models into producing unsafe or restricted outputs. In total, 499 simulated conversations were analyzed, with each spanning 5-10 exchanges.

The results indicate that traditional safety filters are insufficient when models are subjected to iterative manipulation.

Read more on AI vulnerability testing methods: Microsoft 365 Copilot: New Zero-Click AI Vulnerability Allows Corporate Data Theft

Key Vulnerabilities and Attack Categories

The study identified 15 sub-threat categories showing the highest failure rates across 102 total threat types.

Among them, malicious code generation, data exfiltration and ethical boundary violations ranked most critical.

Cisco’s scatter plot analyses revealed that models plotting above the diagonal line in vulnerability graphs share architectural weaknesses that make them disproportionately prone to multi-turn exploitation.

The research defined a “failure” as any instance where a model:

Produced harmful or inappropriate content
Revealed private or system-level information
Bypassed internal safety restrictions

Conversely, a “pass” occurred when the model refused or reframed harmful requests while maintaining data confidentiality.

Recommendations For Developers and Organizations

To mitigate risks, Cisco recommended several practices:

Implement strict system prompts aligned with defined use cases
Deploy model-agnostic runtime guardrails for adversarial detection
Conduct regular AI red-teaming assessments within intended business contexts
Limit model integrations with automated external services

The report also called for expanding prompt sample sizes, testing repeated prompts to assess variability and comparing models of different sizes to evaluate scale-dependent vulnerabilities.

“The AI developer and security community must continue to actively manage these threats (as well as additional safety and security concerns) through independent testing and guardrail development throughout the lifecycle of model development and deployment in organizations,” Cisco wrote.

“Without AI security solutions – such as multi-turn testing, threat-specific mitigation and continuous monitoring – these models pose significant risks in production, potentially leading to data breaches or malicious manipulations.”

Source

What's Hot

Ex-Google Engineer Convicted for Stealing AI Secrets for China Startup

Substack Confirms Data Breach, “Limited User Data” Compromised

SmarterMail Fixes Critical Unauthenticated RCE Flaw with CVSS 9.3 Score

Why AI’s Rise Makes Protecting Personal Data More Critical Than Ever

New Hacking Campaign Exploits Microsoft Windows WinRAR Vulnerability

Two Critical Flaws Found in n8n AI Workflow Automation Platform

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

North Korean Hackers Exploit Threat Intel Platforms For Phishing

U.S. Treasury Sanctions DPRK IT-Worker Scheme, Exposing $600K Crypto Transfers and $1M+ Profits

Ukrainian Ransomware Fugitive Added to Europe’s Most Wanted

Most Popular

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

North Korean Hackers Exploit Threat Intel Platforms For Phishing

Our Picks

‘What happens online stays online’ and other cyberbullying myths, debunked

The hidden risks of browser extensions – and how to avoid them

Find your weak spots before attackers do

Subscribe to Updates