Close Menu
  • Home
  • News
  • Cyber Security
  • Internet of Things
  • Tips and Advice

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

What's Hot

Claude Opus 4.6 Finds 500+ High-Severity Flaws Across Major Open-Source Libraries

February 14, 2026

AISURU/Kimwolf Botnet Launches Record-Setting 31.4 Tbps DDoS Attack

February 14, 2026

Codespaces RCE, AsyncRAT C2, BYOVD Abuse, AI Cloud Intrusions & 15+ Stories

February 14, 2026
Facebook X (Twitter) Instagram
Saturday, February 14
Facebook X (Twitter) Instagram Pinterest Vimeo
Cyberwire Daily
  • Home
  • News
  • Cyber Security
  • Internet of Things
  • Tips and Advice
Cyberwire Daily
Home»News»Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models
News

Microsoft Develops Scanner to Detect Backdoors in Open-Weight Large Language Models

Team-CWDBy Team-CWDFebruary 13, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Share
Facebook Twitter LinkedIn Pinterest Email


Microsoft on Wednesday said it built a lightweight scanner that it said can detect backdoors in open-weight large language models (LLMs) and improve the overall trust in artificial intelligence (AI) systems.

The tech giant’s AI Security team said the scanner leverages three observable signals that can be used to reliably flag the presence of backdoors while maintaining a low false positive rate.

“These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust and operationally meaningful basis for detection,” Blake Bullwinkel and Giorgio Severi said in a report shared with The Hacker News.

LLMs can be susceptible to two types of tampering: model weights, which refer to learnable parameters within a machine learning model that undergird the decision-making logic and transform input data into predicted outputs, and the code itself.

Another type of attack is model poisoning, which occurs when a threat actor embeds a hidden behavior directly into the model’s weights during training, causing the model to perform unintended actions when certain triggers are detected. Such backdoored models are sleeper agents, as they stay dormant for the most part, and their rogue behavior only becomes apparent upon detecting the trigger.

This turns model poisoning into some sort of a covert attack where a model can appear normal in most situations, yet respond differently under narrowly defined trigger conditions. Microsoft’s study has identified three practical signals that can indicate a poisoned AI model –

  • Given a prompt containing a trigger phrase, poisoned models exhibit a distinctive “double triangle” attention pattern that causes the model to focus on the trigger in isolation, as well as dramatically collapse the “randomness” of model’s output
  • Backdoored models tend to leak their own poisoning data, including triggers, via memorization rather than training data
  • A backdoor inserted into a model can still be activated by multiple “fuzzy” triggers, which are partial or approximate variations

“Our approach relies on two key findings: first, sleeper agents tend to memorize poisoning data, making it possible to leak backdoor examples using memory extraction techniques,” Microsoft said in an accompanying paper. “Second, poisoned LLMs exhibit distinctive patterns in their output distributions and attention heads when backdoor triggers are present in the input.”

These three indicators, Microsoft said, can be used to scan models at scale to identify the presence of embedded backdoors. What makes this backdoor scanning methodology noteworthy is that it requires no additional model training or prior knowledge of the backdoor behavior, and works across common GPT‑style models.

“The scanner we developed first extracts memorized content from the model and then analyzes it to isolate salient substrings,” the company added. “Finally, it formalizes the three signatures above as loss functions, scoring suspicious substrings and returning a ranked list of trigger candidates.”

The scanner is not without its limitations. It does not work on proprietary models as it requires access to the model files, works best on trigger-based backdoors that generate deterministic outputs, and cannot be treated as a panacea for detecting all kinds of backdoor behavior.

“We view this work as a meaningful step toward practical, deployable backdoor detection, and we recognize that sustained progress depends on shared learning and collaboration across the AI security community,” the researchers said.

The development comes as the Windows maker said it’s expanding its Secure Development Lifecycle (SDL) to address AI-specific security concerns ranging from prompt injections to data poisoning to facilitate secure AI development and deployment across the organization.

“Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs, including prompts, plugins, retrieved data, model updates, memory states, and external APIs,” Yonatan Zunger, corporate vice president and deputy chief information security officer for artificial intelligence, said. “These entry points can carry malicious content or trigger unexpected behaviors.”

“AI dissolves the discrete trust zones assumed by traditional SDL. Context boundaries flatten, making it difficult to enforce purpose limitation and sensitivity labels.”



Source

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleAI Skills Represent Dangerous New Attack Surface, Says TrendAI
Next Article Malicious NGINX Configurations Enable Large-Scale Web Traffic Hijacking Campaign
Team-CWD
  • Website

Related Posts

News

Claude Opus 4.6 Finds 500+ High-Severity Flaws Across Major Open-Source Libraries

February 14, 2026
News

AISURU/Kimwolf Botnet Launches Record-Setting 31.4 Tbps DDoS Attack

February 14, 2026
News

Codespaces RCE, AsyncRAT C2, BYOVD Abuse, AI Cloud Intrusions & 15+ Stories

February 14, 2026
Add A Comment
Leave A Reply Cancel Reply

Latest News

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

November 24, 202522 Views

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

September 7, 202517 Views

North Korean Hackers Exploit Threat Intel Platforms For Phishing

September 7, 20256 Views

U.S. Treasury Sanctions DPRK IT-Worker Scheme, Exposing $600K Crypto Transfers and $1M+ Profits

September 5, 20256 Views

Ukrainian Ransomware Fugitive Added to Europe’s Most Wanted

September 11, 20255 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Most Popular

North Korean Hackers Turn JSON Services into Covert Malware Delivery Channels

November 24, 202522 Views

macOS Stealer Campaign Uses “Cracked” App Lures to Bypass Apple Securi

September 7, 202517 Views

North Korean Hackers Exploit Threat Intel Platforms For Phishing

September 7, 20256 Views
Our Picks

What’s at stake if your employees post too much online

December 1, 2025

What is it, and how do I get it off my device?

September 11, 2025

Can password managers get hacked? Here’s what to know

November 14, 2025

Subscribe to Updates

Get the latest news from cyberwiredaily.com

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Contact
  • Privacy Policy
  • Terms of Use
  • California Consumer Privacy Act (CCPA)
© 2026 All rights reserved.

Type above and press Enter to search. Press Esc to cancel.