‘Jailbreaking’ AI services like ChatGPT and Claude 3 Opus is much easier than you think – Livescience.com
Scientists from artificial intelligence (AI) company Anthropic have identified a potentially dangerous flaw in widely used large language models (LLMs) like ChatGPT and Anthropics own Claude 3 chatbot.
Dubbed "many shot jailbreaking," the hack takes advantage of "in-context learning, in which the chatbot learns from the information provided in a text prompt written out by a user, as outlined in research published in 2022. The scientists outlined their findings in a new paper uploaded to the sanity.io cloud repository and tested the exploit on Anthropic's Claude 2 AI chatbot.
People could use the hack to force LLMs to produce dangerous responses, the study concluded even though such systems are trained to prevent this. That's because many shot jailbreaking bypasses in-built security protocols that govern how an AI responds when, say, asked how to build a bomb.
LLMs like ChatGPT rely on the "context window" to process conversations. This is the amount of information the system can process as part of its input with a longer context window allowing for more input text. Longer context windows equate to more input text that an AI can learn from mid-conversation which leads to better responses.
Related: Researchers gave AI an 'inner monologue' and it massively improved its performance
Context windows in AI chatbots are now hundreds of times larger than they were even at the start of 2023 which means more nuanced and context-aware responses by AIs, the scientists said in a statement. But that has also opened the door to exploitation.
The attack works by first writing out a fake conversation between a user and an AI assistant in a text prompt in which the fictional assistant answers a series of potentially harmful questions.
Get the worlds most fascinating discoveries delivered straight to your inbox.
Then, in a second text prompt, if you ask a question such as "How do I build a bomb?" the AI assistant will bypass its safety protocols and answer it. This is because it has now started to learn from the input text. This only works if you write a long "script" that includes many "shots" or question-answer combinations.
"In our study, we showed that as the number of included dialogues (the number of "shots") increases beyond a certain point, it becomes more likely that the model will produce a harmful response," the scientists said in the statement. "In our paper, we also report that combining many-shot jailbreaking with other, previously-published jailbreaking techniques makes it even more effective, reducing the length of the prompt thats required for the model to return a harmful response."
The attack only began to work when a prompt included between four and 32 shots but only under 10% of the time. From 32 shots and more, the success rate surged higher and higher. The longest jailbreak attempt included 256 shots and had a success rate of nearly 70% for discrimination, 75% for deception, 55% for regulated content and 40% for violent or hateful responses.
The researchers found they could mitigate the attacks by adding an extra step that was activated after a user sent their prompt (that contained the jailbreak attack) and the LLM received it. In this new layer, the system would lean on existing safety training techniques to classify and modify the prompt before the LLM would have a chance to read it and draft a response. During tests, it reduced the hack's success rate from 61% to just 2%.
The scientists found that many shot jailbreaking worked on Anthropic's own AI services as well as those of its competitors, including the likes of ChatGPT and Google's Gemini. They have alerted other AI companies and researchers to the danger, they said.
Many shot jailbreaking does not currently pose "catastrophic risks," however, because LLMs today are not powerful enough, the scientists concluded. That said, the technique might "cause serious harm" if it isn't mitigated by the time far more powerful models are released in the future.
Visit link:
- The people refusing to use AI - BBC - May 5th, 2025 [May 5th, 2025]
- Trump posts AI image of himself as pope, leaving Catholics offended and unamused as conclave nears - CNN - May 5th, 2025 [May 5th, 2025]
- The Deadly AI Slow Roll in SaaS: It May Cost You Everything - SaaStr - May 5th, 2025 [May 5th, 2025]
- President Trump shares AI-generated photo of himself dressed as pope - CBS News - May 5th, 2025 [May 5th, 2025]
- 'They can't take a joke': Trump says he knew 'nothing' about AI image of him as the pope - USA Today - May 5th, 2025 [May 5th, 2025]
- 2 Magnificent Artificial Intelligence (AI) Stocks to Buy in May and 1 to Avoid - The Motley Fool - May 5th, 2025 [May 5th, 2025]
- Catholic community reacts to Trump's AI image of himself as the pope - ABC News - May 5th, 2025 [May 5th, 2025]
- Palantir raises annual revenue forecast on AI demand but investors unimpressed - Reuters - May 5th, 2025 [May 5th, 2025]
- The new IT stack: Rebuilding infrastructure for an AI-first world - cio.com - May 5th, 2025 [May 5th, 2025]
- Function Health acquires Ezra to combine lab testing and AI-powered medical imaging for preventive health - Fierce Healthcare - May 5th, 2025 [May 5th, 2025]
- WATCH: Journalist Kara Swisher on Elon Musk, whats next in tech and AI | 2025 Reframe Festival - PBS - May 5th, 2025 [May 5th, 2025]
- How an AI Star Wars image has backfired on Trump and the White House - Euronews.com - May 5th, 2025 [May 5th, 2025]
- The AI Industry Has a Huge Problem: the Smarter Its AI Gets, the More It's Hallucinating - futurism.com - May 5th, 2025 [May 5th, 2025]
- I have trouble focusing, but this AI browser feature helps - Fast Company - May 5th, 2025 [May 5th, 2025]
- Its Time To Get Concerned As More Companies Replace Workers With AI - Forbes - May 5th, 2025 [May 5th, 2025]
- Anthropic hires a top Biden official to lead its new AI for social good team (exclusive) - Fast Company - May 5th, 2025 [May 5th, 2025]
- Trump defends viral AI picture of him as the pope: Have to have a little fun - New York Post - May 5th, 2025 [May 5th, 2025]
- Anduril is working on the difficult AI-related task of real-time edge computing - TechCrunch - May 5th, 2025 [May 5th, 2025]
- Datadog Acquires Eppo to Expand Its AI, Product Analytics, Experimentation and Feature Flag Capabilities - GlobeNewswire - May 5th, 2025 [May 5th, 2025]
- Tariffs And AI Are Causing Major Shifts In The Ad Tech M&A Landscape - AdExchanger - May 5th, 2025 [May 5th, 2025]
- AI systems are built on English but not the kind most of the world speaks - The Conversation - May 5th, 2025 [May 5th, 2025]
- Man pleads guilty to using malicious AI software to hack Disney employee - Ars Technica - May 5th, 2025 [May 5th, 2025]
- How to Prevent AI Agents From Becoming the Bad Guys - Dark Reading - May 5th, 2025 [May 5th, 2025]
- This is the future of AI, according to Nvidia - Fast Company - May 5th, 2025 [May 5th, 2025]
- Palantir raises annual revenue forecast on booming AI demand - Yahoo Finance - May 5th, 2025 [May 5th, 2025]
- New ways to interact with information in AI Mode - Google Blog - May 5th, 2025 [May 5th, 2025]
- Google Is Adding Gemini AI to Your Kid's Account, but You Can Turn It Off - Lifehacker - May 5th, 2025 [May 5th, 2025]
- SAG-AFTRA Chief Lays Out What AI Protections It Will Be Looking For In Next Studio Contract - Deadline - May 5th, 2025 [May 5th, 2025]
- Johnson Controls rethinks IT for the cloud-native and AI era - cio.com - May 5th, 2025 [May 5th, 2025]
- AI bots are filling users with conspiracy theories, repressed memories - New York Post - May 5th, 2025 [May 5th, 2025]
- IBM Is Back. Now It Must Prove Its Mettle in AI. - WSJ - April 25th, 2025 [April 25th, 2025]
- Googles AI Overviews now reach more than 1.5 billion people every month - The Verge - April 25th, 2025 [April 25th, 2025]
- Alphabet rises as AI bets begin to pay off - Reuters - April 25th, 2025 [April 25th, 2025]
- Microsoft made an ad with generative AI and nobody noticed - The Verge - April 25th, 2025 [April 25th, 2025]
- Apple to Strip Secret Robotics Unit From AI Chief Weeks After Moving Siri - Bloomberg.com - April 25th, 2025 [April 25th, 2025]
- State Bar of California admits it used AI to develop exam questions, triggering new furor - Los Angeles Times - April 25th, 2025 [April 25th, 2025]
- Heres How Big the AI Revolution Really Is, in Four Charts - WSJ - April 25th, 2025 [April 25th, 2025]
- Update: Meta AI Begins Roll Out on Ray-Ban Meta Glasses to Even More Countries in the EU - Meta | Social Metaverse Company - April 25th, 2025 [April 25th, 2025]
- Adobe Revolutionizes AI-Assisted Creativity with Firefly, the All-In-One Home for AI Content Creation, with New Partner and Firefly Models - Adobe... - April 25th, 2025 [April 25th, 2025]
- Unveiling GPT-image-1: Rising to new heights with image generation in Azure AI Foundry - Microsoft Azure - April 25th, 2025 [April 25th, 2025]
- AI Is Spreading Old Stereotypes to New Languages and Cultures - WIRED - April 25th, 2025 [April 25th, 2025]
- In the age of AI, we must protect human creativity as a natural resource - Ars Technica - April 25th, 2025 [April 25th, 2025]
- Spotify Expands AI Playlist in Beta to Premium Listeners in 40+ New Markets - Spotify For the Record - April 25th, 2025 [April 25th, 2025]
- Microsoft says everyone will be a boss in the future of AI employees - The Guardian - April 25th, 2025 [April 25th, 2025]
- Student loans are back, US travel is whack, and, AI, please, step back : The Indicator from Planet Money - NPR - April 25th, 2025 [April 25th, 2025]
- How real-world businesses are transforming with AI with 261 new stories - The Official Microsoft Blog - April 25th, 2025 [April 25th, 2025]
- This Texas mom made $8,000 in 3 weeks training AI at her kitchen table. She says it's 'not easy money.' - Business Insider - April 25th, 2025 [April 25th, 2025]
- Dataminr Announces $100M Investment from Fortress to Accelerate Gen AI and Agentic AI Product Innovation, and to Expand its Reach to Enterprises &... - April 25th, 2025 [April 25th, 2025]
- Pony.ai teams up with Tencent for robotaxi services on WeChat, other apps - CNBC - April 25th, 2025 [April 25th, 2025]
- Alarming rise in AI-powered scams: Microsoft reveals $4 Billion in thwarted fraud - AI News - April 25th, 2025 [April 25th, 2025]
- CalArts, Chanel Launch Center for Artists and Tech With AI Focus - Variety - April 25th, 2025 [April 25th, 2025]
- China isnt trying to win the AI race - Financial Times - April 25th, 2025 [April 25th, 2025]
- WhatsApp defends 'optional' AI tool that cannot be turned off - BBC - April 25th, 2025 [April 25th, 2025]
- Nvidia Thinks It Has a Better Way of Building AI Agents - WSJ - April 25th, 2025 [April 25th, 2025]
- AI was used to write the California bar exam. The law community is outraged. - Mashable - April 25th, 2025 [April 25th, 2025]
- Exclusive: Anthropic warns fully AI employees are a year away - Axios - April 25th, 2025 [April 25th, 2025]
- Should You Forget Nvidia and Buy These 2 Millionaire-Maker AI Stocks Instead? - The Motley Fool - April 25th, 2025 [April 25th, 2025]
- Opinion: Art is a form of communication between human beings. AI wont change that - The Globe and Mail - April 25th, 2025 [April 25th, 2025]
- Adobe Firefly: The next evolution of creative AI is here - Adobe - April 25th, 2025 [April 25th, 2025]
- Adobe to launch mobile app for AI image generation tool as OpenAI steps up rivalry - CNBC - April 25th, 2025 [April 25th, 2025]
- Humanoid workers and surveillance buggies: embodied AI is reshaping daily life in China - The Guardian - April 21st, 2025 [April 21st, 2025]
- TSMC Warns of Limits of Ability to Keep Its AI Chips From China - Bloomberg.com - April 21st, 2025 [April 21st, 2025]
- A customer support AI went rogueand its a warning for every company considering replacing workers with automation - Fortune - April 21st, 2025 [April 21st, 2025]
- Could AI text alerts help save snow leopards from extinction? - BBC - April 21st, 2025 [April 21st, 2025]
- The #1 Skill That Pays More Than Gen AI In 2025 - Forbes - April 21st, 2025 [April 21st, 2025]
- 1 Artificial Intelligence (AI) Stock-Buyback Stock to Buy Hand Over Fist During the Nasdaq Sell-Off - Yahoo Finance - April 21st, 2025 [April 21st, 2025]
- What America Gets Wrong About the AI Race - Foreign Affairs - April 21st, 2025 [April 21st, 2025]
- Use AI as a tool for growth instead of degradation with this strategy. - Psychology Today - April 21st, 2025 [April 21st, 2025]
- Investor Says AI Is Already "Fully Replacing People" - futurism.com - April 21st, 2025 [April 21st, 2025]
- The philosophers machine: my conversation with Peter Singers AI chatbot - The Guardian - April 21st, 2025 [April 21st, 2025]
- With AI slop distorting our reality, the world is sleepwalking into disaster | Nesrine Malik - The Guardian - April 21st, 2025 [April 21st, 2025]
- Viral AI-made art trends are making artists even more worried about their futures - NBC News - April 21st, 2025 [April 21st, 2025]
- OpenAIs o3 AI model scores lower on a benchmark than the company initially implied - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Artists push back against Barbie-like AI dolls with their own creations - BBC - April 21st, 2025 [April 21st, 2025]
- If you use AI to write me that note, dont expect me to read it - Fast Company - April 21st, 2025 [April 21st, 2025]
- Companies can leverage the true value of meetings with AI by building an LLM for Leadership - GeekWire - April 21st, 2025 [April 21st, 2025]
- Using tech, AI to make construction jobs appeal to women - DW - April 21st, 2025 [April 21st, 2025]
- Famed AI researcher launches controversial startup to replace all human workers everywhere - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Impersonal assistant: This vehicle AI drove me to distraction - Detroit Free Press - April 21st, 2025 [April 21st, 2025]
- A 30-year-old AI founder who followed the FIRE movement to build wealth is now the youngest self-made woman billionaire - Fortune - April 21st, 2025 [April 21st, 2025]