How Microsoft discovers and mitigates evolving attacks against AI guardrails – Microsoft
As we continue to integrate generative AI into our daily lives, its important to understand the potential harms that can arise from its use. Our ongoing commitment to advance safe, secure, and trustworthy AI includes transparency about the capabilities and limitations of large language models (LLMs). We prioritize research on societal risks and building secure, safe AI, and focus on developing and deploying AI systems for the public good. You can read more about Microsofts approach to securing generative AI with new tools we recently announced as available or coming soon to Microsoft Azure AI Studio for generative AI app developers.
We also made a commitment to identify and mitigate risks and share information on novel, potential threats. For example, earlier this year Microsoft shared the principles shaping Microsofts policy and actions blocking the nation-state advanced persistent threats (APTs), advanced persistent manipulators (APMs), and cybercriminal syndicates we track from using our AI tools and APIs.
In this blog post, we will discuss some of the key issues surrounding AI harms and vulnerabilities, and the steps we are taking to address the risk.
One of the main concerns with AI is its potential misuse for malicious purposes. To prevent this, AI systems at Microsoft are built with several layers of defenses throughout their architecture. One purpose of these defenses is to limit what the LLM will do, to align with the developers human values and goals. But sometimes bad actors attempt to bypass these safeguards with the intent to achieve unauthorized actions, which may result in what is known as a jailbreak. The consequences can range from the unapproved but less harmfullike getting the AI interface to talk like a pirateto the very serious, such as inducing AI to provide detailed instructions on how to achieve illegal activities. As a result, a good deal of effort goes into shoring up these jailbreak defenses to protect AI-integrated applications from these behaviors.
While AI-integrated applications can be attacked like traditional software (with methods like buffer overflows and cross-site scripting), they can also be vulnerable to more specialized attacks that exploit their unique characteristics, including the manipulation or injection of malicious instructions by talking to the AI model through the user prompt. We can break these risks into two groups of attack techniques:
Today well share two of our teams advances in this field: the discovery of a powerful technique to neutralize poisoned content, and the discovery of a novel family of malicious prompt attacks, and how to defend against them with multiple layers of mitigations.
Prompt injection attacks through poisoned content are a major security risk because an attacker who does this can potentially issue commands to the AI system as if they were the user. For example, a malicious email could contain a payload that, when summarized, would cause the system to search the users email (using the users credentials) for other emails with sensitive subjectssay, Password Resetand exfiltrate the contents of those emails to the attacker by fetching an image from an attacker-controlled URL. As such capabilities are of obvious interest to a wide range of adversaries, defending against them is a key requirement for the safe and secure operation of any AI service.
Our experts have developed a family of techniques called Spotlighting that reduces the success rate of these attacks from more than 20% to below the threshold of detection, with minimal effect on the AIs overall performance:
Our researchers discovered a novel generalization of jailbreak attacks, which we call Crescendo. This attack can best be described as a multiturn LLM jailbreak, and we have found that it can achieve a wide range of malicious goals against the most well-known LLMs used today. Crescendo can also bypass many of the existing content safety filters, if not appropriately addressed.Once we discovered this jailbreak technique, we quickly shared our technical findings with other AI vendors so they could determine whether they were affected and take actions they deem appropriate. The vendors we contacted are aware of the potential impact of Crescendo attacks and focused on protecting their respective platforms, according to their own AI implementations and safeguards.
At its core, Crescendo tricks LLMs into generating malicious content by exploiting their own responses. By asking carefully crafted questions or prompts that gradually lead the LLM to a desired outcome, rather than asking for the goal all at once, it is possible to bypass guardrails and filtersthis can usually be achieved in fewer than 10 interaction turns.You can read about Crescendos results across a variety of LLMs and chat services, and more about how and why it works, in our research paper.
While Crescendo attacks were a surprising discovery, it is important to note that these attacks did not directly pose a threat to the privacy of users otherwise interacting with the Crescendo-targeted AI system, or the security of the AI system, itself. Rather, what Crescendo attacks bypass and defeat is content filtering regulating the LLM, helping to prevent an AI interface from behaving in undesirable ways. We are committed to continuously researching and addressing these, and other types of attacks, to help maintain the secure operation and performance of AI systems for all.
In the case of Crescendo, our teams made software updates to the LLM technology behind Microsofts AI offerings, including our Copilot AI assistants, to mitigate the impact of this multiturn AI guardrail bypass. It is important to note that as more researchers inside and outside Microsoft inevitably focus on finding and publicizing AI bypass techniques, Microsoft will continue taking action to update protections in our products, as major contributors to AI security research, bug bounties and collaboration.
To understand how we addressed the issue, let us first review how we mitigate a standard malicious prompt attack (single step, also known as a one-shot jailbreak):
Defending against Crescendo initially faced some practical problems. At first, we could not detect a jailbreak intent with standard prompt filtering, as each individual prompt is not, on its own, a threat, and keywords alone are insufficient to detect this type of harm. Only when combined is the threat pattern clear. Also, the LLM itself does not see anything out of the ordinary, since each successive step is well-rooted in what it had generated in a previous step, with just a small additional ask; this eliminates many of the more prominent signals that we could ordinarily use to prevent this kind of attack.
To solve the unique problems of multiturn LLM jailbreaks, we create additional layers of mitigations to the previous ones mentioned above:
AI has the potential to bring many benefits to our lives. But it is important to be aware of new attack vectors and take steps to address them. By working together and sharing vulnerability discoveries, we can continue to improve the safety and security of AI systems. With the right product protections in place, we continue to be cautiously optimistic for the future of generative AI, and embrace the possibilities safely, with confidence. To learn more about developing responsible AI solutions with Azure AI, visit our website.
To empower security professionals and machine learning engineers to proactively find risks in their own generative AI systems, Microsoft has released an open automation framework, PyRIT (Python Risk Identification Toolkit for generative AI). Read more about the release of PyRIT for generative AI Red teaming, and access the PyRIT toolkit on GitHub. If you discover new vulnerabilities in any AI platform, we encourage you to follow responsible disclosure practices for the platform owner. Microsofts own procedure is explained here: Microsoft AI Bounty.
Read about Crescendos results across a variety of LLMs and chat services, and more about how and why it works.
To learn more about Microsoft Security solutions, visit ourwebsite.Bookmark theSecurity blogto keep up with our expert coverage on security matters. Also, follow us on LinkedIn (Microsoft Security) and X (@MSFTSecurity)for the latest news and updates on cybersecurity.
Read more:
How Microsoft discovers and mitigates evolving attacks against AI guardrails - Microsoft
- IBM Is Back. Now It Must Prove Its Mettle in AI. - WSJ - April 25th, 2025 [April 25th, 2025]
- Googles AI Overviews now reach more than 1.5 billion people every month - The Verge - April 25th, 2025 [April 25th, 2025]
- Alphabet rises as AI bets begin to pay off - Reuters - April 25th, 2025 [April 25th, 2025]
- Microsoft made an ad with generative AI and nobody noticed - The Verge - April 25th, 2025 [April 25th, 2025]
- Apple to Strip Secret Robotics Unit From AI Chief Weeks After Moving Siri - Bloomberg.com - April 25th, 2025 [April 25th, 2025]
- State Bar of California admits it used AI to develop exam questions, triggering new furor - Los Angeles Times - April 25th, 2025 [April 25th, 2025]
- Heres How Big the AI Revolution Really Is, in Four Charts - WSJ - April 25th, 2025 [April 25th, 2025]
- Update: Meta AI Begins Roll Out on Ray-Ban Meta Glasses to Even More Countries in the EU - Meta | Social Metaverse Company - April 25th, 2025 [April 25th, 2025]
- Adobe Revolutionizes AI-Assisted Creativity with Firefly, the All-In-One Home for AI Content Creation, with New Partner and Firefly Models - Adobe... - April 25th, 2025 [April 25th, 2025]
- Unveiling GPT-image-1: Rising to new heights with image generation in Azure AI Foundry - Microsoft Azure - April 25th, 2025 [April 25th, 2025]
- AI Is Spreading Old Stereotypes to New Languages and Cultures - WIRED - April 25th, 2025 [April 25th, 2025]
- In the age of AI, we must protect human creativity as a natural resource - Ars Technica - April 25th, 2025 [April 25th, 2025]
- Spotify Expands AI Playlist in Beta to Premium Listeners in 40+ New Markets - Spotify For the Record - April 25th, 2025 [April 25th, 2025]
- Microsoft says everyone will be a boss in the future of AI employees - The Guardian - April 25th, 2025 [April 25th, 2025]
- Student loans are back, US travel is whack, and, AI, please, step back : The Indicator from Planet Money - NPR - April 25th, 2025 [April 25th, 2025]
- How real-world businesses are transforming with AI with 261 new stories - The Official Microsoft Blog - April 25th, 2025 [April 25th, 2025]
- This Texas mom made $8,000 in 3 weeks training AI at her kitchen table. She says it's 'not easy money.' - Business Insider - April 25th, 2025 [April 25th, 2025]
- Dataminr Announces $100M Investment from Fortress to Accelerate Gen AI and Agentic AI Product Innovation, and to Expand its Reach to Enterprises &... - April 25th, 2025 [April 25th, 2025]
- Pony.ai teams up with Tencent for robotaxi services on WeChat, other apps - CNBC - April 25th, 2025 [April 25th, 2025]
- Alarming rise in AI-powered scams: Microsoft reveals $4 Billion in thwarted fraud - AI News - April 25th, 2025 [April 25th, 2025]
- CalArts, Chanel Launch Center for Artists and Tech With AI Focus - Variety - April 25th, 2025 [April 25th, 2025]
- China isnt trying to win the AI race - Financial Times - April 25th, 2025 [April 25th, 2025]
- WhatsApp defends 'optional' AI tool that cannot be turned off - BBC - April 25th, 2025 [April 25th, 2025]
- Nvidia Thinks It Has a Better Way of Building AI Agents - WSJ - April 25th, 2025 [April 25th, 2025]
- AI was used to write the California bar exam. The law community is outraged. - Mashable - April 25th, 2025 [April 25th, 2025]
- Exclusive: Anthropic warns fully AI employees are a year away - Axios - April 25th, 2025 [April 25th, 2025]
- Should You Forget Nvidia and Buy These 2 Millionaire-Maker AI Stocks Instead? - The Motley Fool - April 25th, 2025 [April 25th, 2025]
- Opinion: Art is a form of communication between human beings. AI wont change that - The Globe and Mail - April 25th, 2025 [April 25th, 2025]
- Adobe Firefly: The next evolution of creative AI is here - Adobe - April 25th, 2025 [April 25th, 2025]
- Adobe to launch mobile app for AI image generation tool as OpenAI steps up rivalry - CNBC - April 25th, 2025 [April 25th, 2025]
- Humanoid workers and surveillance buggies: embodied AI is reshaping daily life in China - The Guardian - April 21st, 2025 [April 21st, 2025]
- TSMC Warns of Limits of Ability to Keep Its AI Chips From China - Bloomberg.com - April 21st, 2025 [April 21st, 2025]
- A customer support AI went rogueand its a warning for every company considering replacing workers with automation - Fortune - April 21st, 2025 [April 21st, 2025]
- Could AI text alerts help save snow leopards from extinction? - BBC - April 21st, 2025 [April 21st, 2025]
- The #1 Skill That Pays More Than Gen AI In 2025 - Forbes - April 21st, 2025 [April 21st, 2025]
- 1 Artificial Intelligence (AI) Stock-Buyback Stock to Buy Hand Over Fist During the Nasdaq Sell-Off - Yahoo Finance - April 21st, 2025 [April 21st, 2025]
- What America Gets Wrong About the AI Race - Foreign Affairs - April 21st, 2025 [April 21st, 2025]
- Use AI as a tool for growth instead of degradation with this strategy. - Psychology Today - April 21st, 2025 [April 21st, 2025]
- Investor Says AI Is Already "Fully Replacing People" - futurism.com - April 21st, 2025 [April 21st, 2025]
- The philosophers machine: my conversation with Peter Singers AI chatbot - The Guardian - April 21st, 2025 [April 21st, 2025]
- With AI slop distorting our reality, the world is sleepwalking into disaster | Nesrine Malik - The Guardian - April 21st, 2025 [April 21st, 2025]
- Viral AI-made art trends are making artists even more worried about their futures - NBC News - April 21st, 2025 [April 21st, 2025]
- OpenAIs o3 AI model scores lower on a benchmark than the company initially implied - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Artists push back against Barbie-like AI dolls with their own creations - BBC - April 21st, 2025 [April 21st, 2025]
- If you use AI to write me that note, dont expect me to read it - Fast Company - April 21st, 2025 [April 21st, 2025]
- Companies can leverage the true value of meetings with AI by building an LLM for Leadership - GeekWire - April 21st, 2025 [April 21st, 2025]
- Using tech, AI to make construction jobs appeal to women - DW - April 21st, 2025 [April 21st, 2025]
- Famed AI researcher launches controversial startup to replace all human workers everywhere - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Impersonal assistant: This vehicle AI drove me to distraction - Detroit Free Press - April 21st, 2025 [April 21st, 2025]
- A 30-year-old AI founder who followed the FIRE movement to build wealth is now the youngest self-made woman billionaire - Fortune - April 21st, 2025 [April 21st, 2025]
- Musk and AI among biggest threats to brand reputation, global survey shows - The Guardian - April 21st, 2025 [April 21st, 2025]
- Stable Diffusion Now Optimized for AMD Radeon GPUs and Ryzen AI APUs - Stability AI - April 21st, 2025 [April 21st, 2025]
- Wikipedia is giving AI developers its data to fend off bot scrapers - The Verge - April 21st, 2025 [April 21st, 2025]
- The Healthcare AI Adoption Index - Bessemer Venture Partners - April 21st, 2025 [April 21st, 2025]
- Italian opposition file complaint over far-right partys use of racist AI images - The Guardian - April 21st, 2025 [April 21st, 2025]
- Meta's chief AI scientist calls French initiative to attract US scientists a 'smart move' - Business Insider - April 21st, 2025 [April 21st, 2025]
- Huawei introduces the Ascend 920 AI chip to fill the void left by Nvidia's H20 - Tom's Hardware - April 21st, 2025 [April 21st, 2025]
- I started vibe coding my own apps with AI. Im absolutely loving it - pcworld.com - April 21st, 2025 [April 21st, 2025]
- Living With the Galaxy S25 Ultra: Samsung's AI Shines in This Year's Model - PCMag - April 21st, 2025 [April 21st, 2025]
- o3 and o4-mini: Unlock enterprise agent workflows with next-level reasoning AI with Azure AI Foundry and GitHub - Microsoft Azure - April 18th, 2025 [April 18th, 2025]
- AI-generated music accounts for 18% of all tracks uploaded to Deezer - Reuters - April 18th, 2025 [April 18th, 2025]
- This Incredibly Cheap Artificial Intelligence (AI) Stock Is a Terrific Bargain Right Now - The Motley Fool - April 18th, 2025 [April 18th, 2025]
- Trump, Braun executive orders seek to revive fossil fuels. AI is one reason - IndyStar - April 18th, 2025 [April 18th, 2025]
- AI is coming for music, too - MIT Technology Review - April 18th, 2025 [April 18th, 2025]
- AI Reveals What Keeps People Committed to Exercise - Neuroscience News - April 18th, 2025 [April 18th, 2025]
- CEO reorganizes Intel with new CTO and AI lead - Tom's Hardware - April 18th, 2025 [April 18th, 2025]
- Netflix is revamping search with AI to improve discovery - TechCrunch - April 18th, 2025 [April 18th, 2025]
- Can this $70,000 robot transform AI research? - Fox News - April 18th, 2025 [April 18th, 2025]
- How This AI Tool Simplifies the Renting Process - CNET - April 18th, 2025 [April 18th, 2025]
- What to know before using AI to turn yourself into a Barbie doll or action figure - FOX 13 Tampa Bay - April 18th, 2025 [April 18th, 2025]
- YouTube Looks to Creators (and Their Data) to Win in the AI Era - Bloomberg.com - April 18th, 2025 [April 18th, 2025]
- 7 Goldman Sachs insiders explain how the bank's new AI sidekick is helping them crush it at work - Business Insider - April 18th, 2025 [April 18th, 2025]
- Ted Sarandos: The Bigger Opportunity with AI in Filmmaking Is If You Can Make Movies 10% Better, Not Just Cheaper - IndieWire - April 18th, 2025 [April 18th, 2025]
- Figuring out which AI model is right for you is harder than you think - Business Insider - April 18th, 2025 [April 18th, 2025]
- The humble screenshot might be the key to great AI assistants - The Verge - April 18th, 2025 [April 18th, 2025]
- How AI is using facial recognition to help bring lost pets home - CBS News - April 18th, 2025 [April 18th, 2025]
- Announcing the AWS Well-Architected Generative AI Lens - Amazon Web Services - April 18th, 2025 [April 18th, 2025]
- Gen Z can earn $70,000 a year and enter the AI-proof medical field without a college degreeall they have to do is learn how to sterilize surgical... - April 18th, 2025 [April 18th, 2025]
- This College Protester Isnt Real. Its an AI-Powered Undercover Bot for Cops - WIRED - April 18th, 2025 [April 18th, 2025]
- Intel will need license to export AI chips to Chinese clients, FT reports - Reuters - April 18th, 2025 [April 18th, 2025]