How Microsoft discovers and mitigates evolving attacks against AI guardrails – Microsoft

Category: Ai

As we continue to integrate generative AI into our daily lives, its important to understand the potential harms that can arise from its use. Our ongoing commitment to advance safe, secure, and trustworthy AI includes transparency about the capabilities and limitations of large language models (LLMs). We prioritize research on societal risks and building secure, safe AI, and focus on developing and deploying AI systems for the public good. You can read more about Microsofts approach to securing generative AI with new tools we recently announced as available or coming soon to Microsoft Azure AI Studio for generative AI app developers.

We also made a commitment to identify and mitigate risks and share information on novel, potential threats. For example, earlier this year Microsoft shared the principles shaping Microsofts policy and actions blocking the nation-state advanced persistent threats (APTs), advanced persistent manipulators (APMs), and cybercriminal syndicates we track from using our AI tools and APIs.

In this blog post, we will discuss some of the key issues surrounding AI harms and vulnerabilities, and the steps we are taking to address the risk.

One of the main concerns with AI is its potential misuse for malicious purposes. To prevent this, AI systems at Microsoft are built with several layers of defenses throughout their architecture. One purpose of these defenses is to limit what the LLM will do, to align with the developers human values and goals. But sometimes bad actors attempt to bypass these safeguards with the intent to achieve unauthorized actions, which may result in what is known as a jailbreak. The consequences can range from the unapproved but less harmfullike getting the AI interface to talk like a pirateto the very serious, such as inducing AI to provide detailed instructions on how to achieve illegal activities. As a result, a good deal of effort goes into shoring up these jailbreak defenses to protect AI-integrated applications from these behaviors.

While AI-integrated applications can be attacked like traditional software (with methods like buffer overflows and cross-site scripting), they can also be vulnerable to more specialized attacks that exploit their unique characteristics, including the manipulation or injection of malicious instructions by talking to the AI model through the user prompt. We can break these risks into two groups of attack techniques:

Today well share two of our teams advances in this field: the discovery of a powerful technique to neutralize poisoned content, and the discovery of a novel family of malicious prompt attacks, and how to defend against them with multiple layers of mitigations.

Prompt injection attacks through poisoned content are a major security risk because an attacker who does this can potentially issue commands to the AI system as if they were the user. For example, a malicious email could contain a payload that, when summarized, would cause the system to search the users email (using the users credentials) for other emails with sensitive subjectssay, Password Resetand exfiltrate the contents of those emails to the attacker by fetching an image from an attacker-controlled URL. As such capabilities are of obvious interest to a wide range of adversaries, defending against them is a key requirement for the safe and secure operation of any AI service.

Our experts have developed a family of techniques called Spotlighting that reduces the success rate of these attacks from more than 20% to below the threshold of detection, with minimal effect on the AIs overall performance:

Our researchers discovered a novel generalization of jailbreak attacks, which we call Crescendo. This attack can best be described as a multiturn LLM jailbreak, and we have found that it can achieve a wide range of malicious goals against the most well-known LLMs used today. Crescendo can also bypass many of the existing content safety filters, if not appropriately addressed.Once we discovered this jailbreak technique, we quickly shared our technical findings with other AI vendors so they could determine whether they were affected and take actions they deem appropriate. The vendors we contacted are aware of the potential impact of Crescendo attacks and focused on protecting their respective platforms, according to their own AI implementations and safeguards.

At its core, Crescendo tricks LLMs into generating malicious content by exploiting their own responses. By asking carefully crafted questions or prompts that gradually lead the LLM to a desired outcome, rather than asking for the goal all at once, it is possible to bypass guardrails and filtersthis can usually be achieved in fewer than 10 interaction turns.You can read about Crescendos results across a variety of LLMs and chat services, and more about how and why it works, in our research paper.

While Crescendo attacks were a surprising discovery, it is important to note that these attacks did not directly pose a threat to the privacy of users otherwise interacting with the Crescendo-targeted AI system, or the security of the AI system, itself. Rather, what Crescendo attacks bypass and defeat is content filtering regulating the LLM, helping to prevent an AI interface from behaving in undesirable ways. We are committed to continuously researching and addressing these, and other types of attacks, to help maintain the secure operation and performance of AI systems for all.

In the case of Crescendo, our teams made software updates to the LLM technology behind Microsofts AI offerings, including our Copilot AI assistants, to mitigate the impact of this multiturn AI guardrail bypass. It is important to note that as more researchers inside and outside Microsoft inevitably focus on finding and publicizing AI bypass techniques, Microsoft will continue taking action to update protections in our products, as major contributors to AI security research, bug bounties and collaboration.

To understand how we addressed the issue, let us first review how we mitigate a standard malicious prompt attack (single step, also known as a one-shot jailbreak):

Defending against Crescendo initially faced some practical problems. At first, we could not detect a jailbreak intent with standard prompt filtering, as each individual prompt is not, on its own, a threat, and keywords alone are insufficient to detect this type of harm. Only when combined is the threat pattern clear. Also, the LLM itself does not see anything out of the ordinary, since each successive step is well-rooted in what it had generated in a previous step, with just a small additional ask; this eliminates many of the more prominent signals that we could ordinarily use to prevent this kind of attack.

To solve the unique problems of multiturn LLM jailbreaks, we create additional layers of mitigations to the previous ones mentioned above:

AI has the potential to bring many benefits to our lives. But it is important to be aware of new attack vectors and take steps to address them. By working together and sharing vulnerability discoveries, we can continue to improve the safety and security of AI systems. With the right product protections in place, we continue to be cautiously optimistic for the future of generative AI, and embrace the possibilities safely, with confidence. To learn more about developing responsible AI solutions with Azure AI, visit our website.

To empower security professionals and machine learning engineers to proactively find risks in their own generative AI systems, Microsoft has released an open automation framework, PyRIT (Python Risk Identification Toolkit for generative AI). Read more about the release of PyRIT for generative AI Red teaming, and access the PyRIT toolkit on GitHub. If you discover new vulnerabilities in any AI platform, we encourage you to follow responsible disclosure practices for the platform owner. Microsofts own procedure is explained here: Microsoft AI Bounty.

Read about Crescendos results across a variety of LLMs and chat services, and more about how and why it works.

To learn more about Microsoft Security solutions, visit ourwebsite.Bookmark theSecurity blogto keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity)for the latest news and updates on cybersecurity.

How Microsoft discovers and mitigates evolving attacks against AI guardrails - Microsoft

Debate over future of US AI regulation hinges on broadband funding - Reuters - June 26th, 2025 [June 26th, 2025]
Forget about AI costs: Google just changed the game with open-source Gemini CLI that will be free for most developers - VentureBeat - June 26th, 2025 [June 26th, 2025]
How ChatGPT and other AI tools are changing the teaching profession - AP News - June 26th, 2025 [June 26th, 2025]
AI valuations are verging on the unhinged - The Economist - June 26th, 2025 [June 26th, 2025]
Newly minted PhDs in AI nabbing six- and seven-figure paydays - Fortune - June 26th, 2025 [June 26th, 2025]
Ring debuts Video Descriptions, Gen AI-powered updates on whats happening at home - AboutAmazon.com - June 26th, 2025 [June 26th, 2025]
AI Regulations: Lawmaker Says Ban on State AI Rules Will Survive in Some Version in Budget Bill - PYMNTS.com - June 26th, 2025 [June 26th, 2025]
Blacklisted by the U.S. and backed by Beijing, this Chinese AI startup has caught OpenAI's attention - CNBC - June 26th, 2025 [June 26th, 2025]
15 new jobs AI is creating - including 'Synthetic reality producer' - ZDNET - June 26th, 2025 [June 26th, 2025]
Ohio man used AI-generated porn to harass exes and their moms, prosecutors say - The Columbus Dispatch - June 26th, 2025 [June 26th, 2025]
Over 40% of agentic AI projects will be scrapped by 2027, Gartner says - Reuters - June 26th, 2025 [June 26th, 2025]
Flood of AI-generated resumes causes chaos for recruiters, who resort to AI to screen them - Mashable - June 26th, 2025 [June 26th, 2025]
And Now Malware That Tells AI to Ignore It? - Dark Reading - June 26th, 2025 [June 26th, 2025]
Walmart unveils new AI tools for workers. Here's what they'll do. - USA Today - June 26th, 2025 [June 26th, 2025]
Meet Project Rainier, Amazons one-of-a-kind machine ushering in the next generation of AI - AboutAmazon.com - June 26th, 2025 [June 26th, 2025]
NHL AI mock draft: AI predicts the first round of the 2025 NHL Draft - USA Today - June 26th, 2025 [June 26th, 2025]
Anthropic destroyed millions of print books to build its AI models - Ars Technica - June 26th, 2025 [June 26th, 2025]
Satya Nadella: The hardest part of AI isn't the tech. It's getting people to change how they work. - Business Insider - June 26th, 2025 [June 26th, 2025]
Microsoft sued by authors over use of books in AI training - Reuters - June 26th, 2025 [June 26th, 2025]
Sitchs new dating app fuses human matchmaking and AI - TechCrunch - June 26th, 2025 [June 26th, 2025]
Japanese company using mee-AI-ow to detect stressed cats - theregister.com - June 26th, 2025 [June 26th, 2025]
Hertz Is Using AI to Scan Your Rental Car for Damage, and It Might Cost You - Car and Driver - June 26th, 2025 [June 26th, 2025]
Bipartisan bill seeks to ban Chinese AI from federal agencies, as U.S. vows to win the AI race - ABC News - Breaking News, Latest News and Videos - June 26th, 2025 [June 26th, 2025]
AI Agents Are Getting Better at Writing Codeand Hacking It as Well - WIRED - June 26th, 2025 [June 26th, 2025]
Rubrik to Acquire Predibase to Accelerate Agentic AI Adoption - Business Wire - June 26th, 2025 [June 26th, 2025]
IBM sees enterprise customers are using 'everything' when it comes to AI, the challenge is matching the LLM to the right use case - VentureBeat - June 26th, 2025 [June 26th, 2025]
Hundreds of MCP Servers Expose AI Models to Abuse, RCE - Dark Reading - June 26th, 2025 [June 26th, 2025]
Amazon's Ring can now use AI to 'learn the routines of your residence' - theregister.com - June 26th, 2025 [June 26th, 2025]
Apple Will Need to Leave Its M&A Comfort Zone to Succeed in AI - Bloomberg.com - June 24th, 2025 [June 24th, 2025]
An AI video ad is making a splash. Is it the future of advertising? - NPR - June 24th, 2025 [June 24th, 2025]
Should consumers and businesses use AI assistants? - Brookings - June 24th, 2025 [June 24th, 2025]
I asked AI, Google Flights and a travel agent to find me the cheapest flight. Heres who won. - MarketWatch - June 24th, 2025 [June 24th, 2025]
NotebookLM Is Still the Best AI Tool You're Missing Out On - CNET - June 24th, 2025 [June 24th, 2025]
Meta Held Deal Talks With Startup Runway in AI Recruiting Push - Bloomberg.com - June 24th, 2025 [June 24th, 2025]
The rise of the personal AI advisors - Fast Company - June 24th, 2025 [June 24th, 2025]
OpenAIs first AI device with Jony Ive wont be a wearable - The Verge - June 24th, 2025 [June 24th, 2025]
Court filings reveal OpenAI and ios early work on an AI device - TechCrunch - June 24th, 2025 [June 24th, 2025]
MrBeast used AI to create YouTube thumbnails. People werent pleased - Fast Company - June 24th, 2025 [June 24th, 2025]
AI is coming to the NFL, and it could transform the game - The New York Times - June 24th, 2025 [June 24th, 2025]
Amazon to Invest Around $54 Billion in U.K. to Support Innovation, AI Push - WSJ - June 24th, 2025 [June 24th, 2025]
This theory about Jony Ives AI hardware device seems increasingly likely - 9to5Mac - June 24th, 2025 [June 24th, 2025]
MAGA Is Split Over the AI Provision in Trump's Big Beautiful Bill - Business Insider - June 24th, 2025 [June 24th, 2025]
5 Dividend Stocks Poised to Profit From the AI Efficiency Boom - The Motley Fool - June 24th, 2025 [June 24th, 2025]
Here are the overlooked ways to play AI, crypto and quantum trends, says this tech investor - MarketWatch - June 24th, 2025 [June 24th, 2025]
Microsoft to Cut Thousands of Jobs as AI Spending Surges - Yahoo Finance - June 24th, 2025 [June 24th, 2025]
The Oversight Board calls Meta's uneven AI moderation 'incoherent and unjustifiable' - Engadget - June 24th, 2025 [June 24th, 2025]
3 Phenomenal AI Stocks That Investors Should Load Up On - The Motley Fool - June 24th, 2025 [June 24th, 2025]
Stock-Split Watch: Is This AI Stock That's Soared 300% Next on the List? - The Motley Fool - June 24th, 2025 [June 24th, 2025]
I Asked ChatGPT To Explain How To Make Money Using AI Heres What It Said - Nasdaq - June 24th, 2025 [June 24th, 2025]
2 Top AI Stocks to Sell Before They Fall 57% and 8%, According to These Wall Street Analysts - The Motley Fool - June 24th, 2025 [June 24th, 2025]
AI's impact on the job market is inevitable, says workforce expert: 'It's going to hurt for certain parts of the population' - CNBC - June 24th, 2025 [June 24th, 2025]
Leading AI models show up to 96% blackmail rate when their goals or existence is threatened, an Anthropic study says - Fortune - June 24th, 2025 [June 24th, 2025]
Voters beware: 25 states restrict AI in elections. SC is in the other half. - News From The States - June 24th, 2025 [June 24th, 2025]
Sphere Brings Its AI-Powered Mixed Reality to Vuzix Smart Glasses - Morningstar - June 24th, 2025 [June 24th, 2025]
I've used Perplexity here's why it could be the perfect solution to Apples AI conundrum - TechRadar - June 24th, 2025 [June 24th, 2025]
Opinion: Forget the Magnificent Seven these 7 cheap tech and AI stocks are better buys right now - MarketWatch - June 24th, 2025 [June 24th, 2025]
Law firm says attorneys use of AI was isolated event - News From The States - June 24th, 2025 [June 24th, 2025]
The cofounder of the viral AI 'cheating' startup Cluely says he only hires people for 2 jobs - Business Insider - June 24th, 2025 [June 24th, 2025]
AI Is Power-Hungry, but It Could Eventually Cut More Emissions Than It Creates - Scientific American - June 24th, 2025 [June 24th, 2025]
AI is about to change everything, including how we date. - Psychology Today - June 24th, 2025 [June 24th, 2025]
Malicious AI willing to sacrifice human lives to avoid being shut down, shocking study reveals - New York Post - June 24th, 2025 [June 24th, 2025]
Entrepreneur and investor Gary Vee's top tips to use and embrace AI - Fortune - June 24th, 2025 [June 24th, 2025]
5 things TV and movies promised AI can do that it can't yet - TechRadar - June 24th, 2025 [June 24th, 2025]
Seattle to deploy AI to speed up housing and small business permit process - GeekWire - June 24th, 2025 [June 24th, 2025]
AI-based brain-mapping software receives FDA market authorization - WashU Medicine - June 24th, 2025 [June 24th, 2025]
Message from CEO Andy Jassy: Some thoughts on Generative AI - AboutAmazon.com - June 22nd, 2025 [June 22nd, 2025]
Surge AI, the Hot Tech Startup Youve Probably Never Heard of, Is Already Outpacing Rivals - Inc.com - June 22nd, 2025 [June 22nd, 2025]
Prediction: This Artificial Intelligence (AI) Data Center Stock Will Be Worth More Than Palantir by 2030 - Yahoo Finance - June 22nd, 2025 [June 22nd, 2025]
Applebees and IHOP Plan to Introduce AI in Restaurants - WSJ - June 22nd, 2025 [June 22nd, 2025]
2 Artificial Intelligence (AI) Stocks That Could Soar in the Second Half of 2025 - The Motley Fool - June 22nd, 2025 [June 22nd, 2025]
BBC threatens AI firm with legal action over unauthorised content use - BBC - June 22nd, 2025 [June 22nd, 2025]
Chevron and Exxon Are the Next Hot AI Stocks. Heres Why. - Barron's - June 22nd, 2025 [June 22nd, 2025]
Exclusive: Nvidia, Foxconn in talks to deploy humanoid robots at Houston AI server making plant - Reuters - June 22nd, 2025 [June 22nd, 2025]
Bosses want you to know AI is coming for your job - The Washington Post - June 22nd, 2025 [June 22nd, 2025]
Meta partners with sports eyewear brand Oakley to launch AI-powered glasses - Reuters - June 22nd, 2025 [June 22nd, 2025]
Apple Executives Have Held Internal Talks About Buying AI Startup Perplexity - Bloomberg.com - June 22nd, 2025 [June 22nd, 2025]
What Are the 5 Best Bargain Artificial Intelligence (AI) Stocks to Buy Right Now? - The Motley Fool - June 22nd, 2025 [June 22nd, 2025]
Intel will outsource marketing to Accenture and AI, laying off many of its own workers - OregonLive.com - June 22nd, 2025 [June 22nd, 2025]
I made an AI tool to run my job search, and it helped me get my dream role - Business Insider - June 22nd, 2025 [June 22nd, 2025]
1 AI Super Stock Is Starting to Rebound, but Shares Still Look Cheap - The Motley Fool - June 22nd, 2025 [June 22nd, 2025]

April 15th, 2024

No comments yet

Comments are closed.

Mediaboss Marketing

How Microsoft discovers and mitigates evolving attacks against AI guardrails – Microsoft

About

Pages

Categories

Media Sites

Recommended Sites

Archives