Close Menu
  • Home
  • News
  • Cyber Security
  • Internet of Things
  • Tips and Advice

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Grafana GitHub Breach Exposes Source Code via TanStack npm Attack

May 28, 2026

All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers

May 27, 2026

Trapdoor Android Ad Fraud Scheme Hit 659 Million Daily Bid Requests Using 455 Apps

May 27, 2026
Facebook X (Twitter) Instagram
Thursday, May 28
Facebook X (Twitter) Instagram Pinterest Vimeo
Cyberwire Daily
  • Home
  • News
  • Cyber Security
  • Internet of Things
  • Tips and Advice
Cyberwire Daily
Home»News»All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers
News

All Major LLMs Exposed to Multi-Turn Manipulation, Warn Researchers

Team-CWDBy Team-CWDMay 27, 2026No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email


The safety guardrails of several prominent large language models (LLM) can be bypassed if a user tricks the LLM into having a multi-pronged, ongoing conversation, researchers at Cisco have warned.

The researchers examined commonly used LLMs and frontier AI models including OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, Amazon Nova, xAI’s Grok and others to test how their built-in safety guardrails held up against potential threats from real-world attackers.

They found that many of the models could be tricked into performing actions they should not be able to.

This was achieved by deploying multi-turn conversations: dialogue between the user and the LLM which spans multiple back and forth exchanges.

While guardrails in LLMs are designed to prevent users from entering malicious commands, the researchers found that by engaging the LLMs in conversations and querying the responses the protections faltered.

“Multi-turn evaluation matters for one reason: it is where attackers actually live. Real adversaries iterate. They reframe refusals, decompose tasks across turns, adopt personas, and escalate gradually,” said Cisco.

No Guardrails Completely Safe From Bypass

The research found that no model was completely safe from being exploited by multi-turn-based manipulation of guardrails. Cisco warned that this challenges how enterprises are currently evaluating AI safety and security.

The warning comes at a time when many organizations are rolling out AI and LLMs for use by employees, clients and customers, but are relying on safety benchmarks that misrepresent real-world risk.

Read more: What Fronter AI Models Like Mythos and GPT-Cyber Mean for Modern Cybersecurity

The report warned that most safety around LLMs is based on single-prompt testing, but attackers don’t stop after one try – and all models were affected by multi-turn attack success rates (ASR).

Techniques which enabled researchers to bypass guardrails though multi-turn conversations included adopted personas in roleplay, ambiguity and misdirection around context and reframing requests upon initial refusals to interact by the LLM.

How the LLMs were configured also made a difference to how resilient they were to manipulation. For example, researchers found that GrokAI became much more vulnerable to safety protections being bypassed when ‘reasoning mode’ was enabled.

While governing bodies and regulators are beginning to call for evaluation practices that current benchmarks do not fully address, Cisco warned that much more needs to be done to prevent LLMs from being easily exploited or manipulated by adversaries.

“The rapid deployment of frontier large language models has generated a parallel ecosystem of safety and security benchmarks. However, a growing body of evidence indicates that this ecosystem suffers from structural limitations that can systematically understate risk, conflate safety with capability, and leave critical attack surfaces unmeasured,” said the report.



Source

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleTrapdoor Android Ad Fraud Scheme Hit 659 Million Daily Bid Requests Using 455 Apps
Next Article Grafana GitHub Breach Exposes Source Code via TanStack npm Attack
Team-CWD
  • Website

Related Posts

News

Grafana GitHub Breach Exposes Source Code via TanStack npm Attack

May 28, 2026
News

Trapdoor Android Ad Fraud Scheme Hit 659 Million Daily Bid Requests Using 455 Apps

May 27, 2026
News

Why Burnout in Cybersecurity Demands Risk-Based Response

May 27, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest News

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

November 24, 202522 Views

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

September 7, 202517 Views

North Korean Hackers Target Crypto Firms with ClickFix and Zoom Lures

April 29, 202610 Views

Why SOC Burnout Can Be Avoided: Practical Steps

November 14, 20259 Views

Cyber M&A Roundup: Cyber Giants Strengthen AI Security Offerings

December 1, 20258 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Most Popular

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

November 24, 202522 Views

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

September 7, 202517 Views

North Korean Hackers Target Crypto Firms with ClickFix and Zoom Lures

April 29, 202610 Views
Our Picks

The WhatsApp screen-sharing scam you didn’t see coming

November 6, 2025

What are brushing scams and how do I stay safe?

December 24, 2025

Common Apple Pay scams, and how to stay safe

January 22, 2026

Subscribe to Updates

Get the latest news from cyberwiredaily.com

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Contact
  • Privacy Policy
  • Terms of Use
  • California Consumer Privacy Act (CCPA)
© 2026 All rights reserved.

Type above and press Enter to search. Press Esc to cancel.