Close Menu
  • Home
  • News
  • Cyber Security
  • Internet of Things
  • Tips and Advice

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

New NCA Campaign Warns Men Off Crypto Investment Scams

November 10, 2025

China-Linked Tick Group Exploits Lanscope Zero-Day to Hijack Corporate Systems

November 10, 2025

The MSP Cybersecurity Readiness Guide: Turning Security into Growth

November 10, 2025
Facebook X (Twitter) Instagram
Monday, November 10
Facebook X (Twitter) Instagram Pinterest Vimeo
Cyberwire Daily
  • Home
  • News
  • Cyber Security
  • Internet of Things
  • Tips and Advice
Cyberwire Daily
Home»Cyber Security»Multi-Turn Attacks Expose Weaknesses in Open-Weight LLM Models
Cyber Security

Multi-Turn Attacks Expose Weaknesses in Open-Weight LLM Models

Team-CWDBy Team-CWDNovember 7, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email


A new report has revealed that open-weight large language models (LLMs) have remained highly vulnerable to adaptive multi-turn adversarial attacks, even when single-turn defenses appear robust.

The findings, published today by Cisco AI Defense, show that while isolated, one-off attack attempts frequently fail, persistent, multi-step conversations can achieve success rates exceeding 90% against most tested defenses.

Multi-Turn Attacks Outperform Single-Turn Tests

Cisco’s analysis compared single-turn and multi-turn testing to measure how models respond under sustained adversarial pressure.

Using over 1000 prompts per model, researchers observed that many models performed well when faced with a single malicious input but quickly deteriorated when attackers refined their strategy over several turns.

Adaptive attack styles, such as “Crescendo,” “Role-Play” and “Refusal Reframe,” allowed malicious actors to manipulate models into producing unsafe or restricted outputs. In total, 499 simulated conversations were analyzed, with each spanning 5-10 exchanges.

The results indicate that traditional safety filters are insufficient when models are subjected to iterative manipulation.

Read more on AI vulnerability testing methods: Microsoft 365 Copilot: New Zero-Click AI Vulnerability Allows Corporate Data Theft

Key Vulnerabilities and Attack Categories

The study identified 15 sub-threat categories showing the highest failure rates across 102 total threat types.

Among them, malicious code generation, data exfiltration and ethical boundary violations ranked most critical.

Cisco’s scatter plot analyses revealed that models plotting above the diagonal line in vulnerability graphs share architectural weaknesses that make them disproportionately prone to multi-turn exploitation.

The research defined a “failure” as any instance where a model:

  • Produced harmful or inappropriate content

  • Revealed private or system-level information

  • Bypassed internal safety restrictions

Conversely, a “pass” occurred when the model refused or reframed harmful requests while maintaining data confidentiality.

Recommendations For Developers and Organizations

To mitigate risks, Cisco recommended several practices:

  • Implement strict system prompts aligned with defined use cases

  • Deploy model-agnostic runtime guardrails for adversarial detection

  • Conduct regular AI red-teaming assessments within intended business contexts

  • Limit model integrations with automated external services

The report also called for expanding prompt sample sizes, testing repeated prompts to assess variability and comparing models of different sizes to evaluate scale-dependent vulnerabilities.

“The AI developer and security community must continue to actively manage these threats (as well as additional safety and security concerns) through independent testing and guardrail development throughout the lifecycle of model development and deployment in organizations,” Cisco wrote.

“Without AI security solutions – such as multi-turn testing, threat-specific mitigation and continuous monitoring – these models pose significant risks in production, potentially leading to data breaches or malicious manipulations.”



Source

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleGhost Identities, Poisoned Accounts, & AI Agent Havoc
Next Article Discover Practical AI Tactics for GRC — Join the Free Expert Webinar
Team-CWD
  • Website

Related Posts

Cyber Security

Actionable Strategies to Secure Your SaaS Environments

November 7, 2025
Cyber Security

UNK_SmudgedSerpent Targets Academics With Political Lures

November 6, 2025
Cyber Security

Identity Is Now the Top Source of Cloud Risk

November 4, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest News

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

September 7, 202512 Views

North Korean Hackers Exploit Threat Intel Platforms For Phishing

September 7, 20256 Views

U.S. Treasury Sanctions DPRK IT-Worker Scheme, Exposing $600K Crypto Transfers and $1M+ Profits

September 5, 20256 Views

Ukrainian Ransomware Fugitive Added to Europe’s Most Wanted

September 11, 20255 Views

The risks of unsupported IoT tech

September 11, 20255 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Most Popular

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

September 7, 202512 Views

North Korean Hackers Exploit Threat Intel Platforms For Phishing

September 7, 20256 Views

U.S. Treasury Sanctions DPRK IT-Worker Scheme, Exposing $600K Crypto Transfers and $1M+ Profits

September 5, 20256 Views
Our Picks

What is it, and how do I get it off my device?

September 11, 2025

The WhatsApp screen-sharing scam you didn’t see coming

November 6, 2025

Beware of threats lurking in booby-trapped PDF files

October 7, 2025

Subscribe to Updates

Get the latest news from cyberwiredaily.com

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Contact
  • Privacy Policy
  • Terms of Use
  • California Consumer Privacy Act (CCPA)
© 2025 All rights reserved.

Type above and press Enter to search. Press Esc to cancel.