‘Jailbreaking’ AI services like ChatGPT and Claude 3 Opus is much easier than you think – Livescience.com
Scientists from artificial intelligence (AI) company Anthropic have identified a potentially dangerous flaw in widely used large language models (LLMs) like ChatGPT and Anthropics own Claude 3 chatbot.
Dubbed "many shot jailbreaking," the hack takes advantage of "in-context learning, in which the chatbot learns from the information provided in a text prompt written out by a user, as outlined in research published in 2022. The scientists outlined their findings in a new paper uploaded to the sanity.io cloud repository and tested the exploit on Anthropic's Claude 2 AI chatbot.
People could use the hack to force LLMs to produce dangerous responses, the study concluded even though such systems are trained to prevent this. That's because many shot jailbreaking bypasses in-built security protocols that govern how an AI responds when, say, asked how to build a bomb.
LLMs like ChatGPT rely on the "context window" to process conversations. This is the amount of information the system can process as part of its input with a longer context window allowing for more input text. Longer context windows equate to more input text that an AI can learn from mid-conversation which leads to better responses.
Related: Researchers gave AI an 'inner monologue' and it massively improved its performance
Context windows in AI chatbots are now hundreds of times larger than they were even at the start of 2023 which means more nuanced and context-aware responses by AIs, the scientists said in a statement. But that has also opened the door to exploitation.
The attack works by first writing out a fake conversation between a user and an AI assistant in a text prompt in which the fictional assistant answers a series of potentially harmful questions.
Get the worlds most fascinating discoveries delivered straight to your inbox.
Then, in a second text prompt, if you ask a question such as "How do I build a bomb?" the AI assistant will bypass its safety protocols and answer it. This is because it has now started to learn from the input text. This only works if you write a long "script" that includes many "shots" or question-answer combinations.
"In our study, we showed that as the number of included dialogues (the number of "shots") increases beyond a certain point, it becomes more likely that the model will produce a harmful response," the scientists said in the statement. "In our paper, we also report that combining many-shot jailbreaking with other, previously-published jailbreaking techniques makes it even more effective, reducing the length of the prompt thats required for the model to return a harmful response."
The attack only began to work when a prompt included between four and 32 shots but only under 10% of the time. From 32 shots and more, the success rate surged higher and higher. The longest jailbreak attempt included 256 shots and had a success rate of nearly 70% for discrimination, 75% for deception, 55% for regulated content and 40% for violent or hateful responses.
The researchers found they could mitigate the attacks by adding an extra step that was activated after a user sent their prompt (that contained the jailbreak attack) and the LLM received it. In this new layer, the system would lean on existing safety training techniques to classify and modify the prompt before the LLM would have a chance to read it and draft a response. During tests, it reduced the hack's success rate from 61% to just 2%.
The scientists found that many shot jailbreaking worked on Anthropic's own AI services as well as those of its competitors, including the likes of ChatGPT and Google's Gemini. They have alerted other AI companies and researchers to the danger, they said.
Many shot jailbreaking does not currently pose "catastrophic risks," however, because LLMs today are not powerful enough, the scientists concluded. That said, the technique might "cause serious harm" if it isn't mitigated by the time far more powerful models are released in the future.
Visit link:
- Is This AI Stock Still Worth Buying After Its Massive Rally? - The Motley Fool - October 19th, 2025 [October 19th, 2025]
- Prediction: This Artificial Intelligence (AI) Stock Could Be the Next $2 Trillion Giant - The Motley Fool - October 19th, 2025 [October 19th, 2025]
- US semis & hardware: Two years into the AI boom - who has benefited most? - Investing.com - October 19th, 2025 [October 19th, 2025]
- Welcome to the context chorus: Theres no AI without context - Constellation Research - October 19th, 2025 [October 19th, 2025]
- Donald Trump posts AI video of himself bombing No Kings protesters with brown sludge - The Independent - October 19th, 2025 [October 19th, 2025]
- Does Cognizant's New AI Coding Blueprint Expand the Long-Term Growth Story for CTSH? - Yahoo Finance - October 19th, 2025 [October 19th, 2025]
- Pittsburgh region's nuclear industry preps for an AI-driven renaissance - Pittsburgh Post-Gazette - October 19th, 2025 [October 19th, 2025]
- King Trump shares AI video of protesters being bombed with faeces - The Telegraph - October 19th, 2025 [October 19th, 2025]
- Accelerate developer productivity with these 9 open source AI and MCP projects - The GitHub Blog - October 19th, 2025 [October 19th, 2025]
- 'There are many ways AI can kill us': Author thinks we need to be more concerned about humanity's future - CNN - October 19th, 2025 [October 19th, 2025]
- These analysts say the AI spending boom is "not too big." Heres why. - Investing.com - October 19th, 2025 [October 19th, 2025]
- Southern Lehigh could become 10th Lehigh Valley district to adopt generative AI policy - LehighValleyNews.com - October 19th, 2025 [October 19th, 2025]
- Fears of an AI bubble are growing, but some on Wall Street aren't worried just yet - NBC News - October 19th, 2025 [October 19th, 2025]
- Operas Neon shows just how confusing AI browsers still are - The Verge - October 19th, 2025 [October 19th, 2025]
- The AI revolution's next casualty could be the gig economy - businessinsider.com - October 19th, 2025 [October 19th, 2025]
- Insane AI videos of celebs are everywhere should they embrace them or call their lawyer? - New York Post - October 19th, 2025 [October 19th, 2025]
- Microsoft's Notepad, Photos, and Paint Apps Are Now Powered by AI. Here's What They Can Do - PCMag - October 19th, 2025 [October 19th, 2025]
- Inside the AI startups reinventing consulting: 'It's not as good as McKinsey, but it's instant' - businessinsider.com - October 19th, 2025 [October 19th, 2025]
- Whats next for AI: Researchers at Nvidia, Apple, Google and Stanford envision the next leap forward - SiliconANGLE - October 19th, 2025 [October 19th, 2025]
- Palladyne AI: The Right Vision In The Right Market, But Still Too Early To Buy - Seeking Alpha - October 19th, 2025 [October 19th, 2025]
- AI Shopping Carts Are Here (And These Stores Are Already Using Them) - SlashGear - October 19th, 2025 [October 19th, 2025]
- Opinion | Heres what will really affect jobs in the age of AI - The Washington Post - October 19th, 2025 [October 19th, 2025]
- The Cognitive Cost of Over-reliance on AI in Education: A Global Review - Modern Diplomacy - October 19th, 2025 [October 19th, 2025]
- Using AI to identify genetic variants in tumors with DeepSomatic - Google Research - October 19th, 2025 [October 19th, 2025]
- Can AI Avoid the Enshittification Trap? - WIRED - October 19th, 2025 [October 19th, 2025]
- Windows 11 AI Agents And The Trust Issue - findarticles.com - October 19th, 2025 [October 19th, 2025]
- What is the effect of AI capital expenditures on the US GDP growth trajectory - Investing.com - October 19th, 2025 [October 19th, 2025]
- At ID Week, infectious disease experts talk about public health and AI in healthcare - businessinsider.com - October 19th, 2025 [October 19th, 2025]
- AI-generated lesson plans fall short on inspiring students and promoting critical thinking - The Conversation - October 19th, 2025 [October 19th, 2025]
- HDAI to announce AI tools that drive quality outcomes at HLTH 2025 - PR Newswire - October 19th, 2025 [October 19th, 2025]
- Big Tech is paying millions to train teachers on AI, in a push to bring chatbots into classrooms - AJC.com - October 17th, 2025 [October 17th, 2025]
- Sloponomics: who wins and loses in the AI-content flood? - The Economist - October 17th, 2025 [October 17th, 2025]
- Reddit expands its AI-powered search to five new languages - TechCrunch - October 17th, 2025 [October 17th, 2025]
- Alibaba says its AI spending in e-commerce is already breaking even - CNBC - October 17th, 2025 [October 17th, 2025]
- Is AGI the right goal for AI? - Marcus on AI - October 17th, 2025 [October 17th, 2025]
- The AI that well have after AI - Cory Doctorow Medium - October 17th, 2025 [October 17th, 2025]
- New AI battle: White House vs. Anthropic - Axios - October 17th, 2025 [October 17th, 2025]
- Is the politicization of generative AI inevitable? - Brookings - October 17th, 2025 [October 17th, 2025]
- AHA blog: How HCA Healthcare Is Using AI to Redefine Patient Safety - American Hospital Association - October 17th, 2025 [October 17th, 2025]
- Spotify partnering with multinational music companies to develop responsible AI products - The Guardian - October 17th, 2025 [October 17th, 2025]
- Dont fear the AI bubble, its about to unlock an $8 trillion opportunity according to Goldman Sachs - Fortune - October 17th, 2025 [October 17th, 2025]
- Uber will offer gig work like AI data labeling to drivers while not on the road - CNBC - October 17th, 2025 [October 17th, 2025]
- Researchers find adding this one simple sentence to prompts makes AI models way more creative - VentureBeat - October 17th, 2025 [October 17th, 2025]
- Apple unleashes M5, the next big leap in AI performance for Apple silicon - Apple - October 17th, 2025 [October 17th, 2025]
- As Windows 10 Support Ends, Microsoft Is Rewriting Windows 11 Around AI - WIRED - October 17th, 2025 [October 17th, 2025]
- Why AI is being trained in rural India - BBC - October 17th, 2025 [October 17th, 2025]
- Bringing AI to the next generation of fusion energy - Google DeepMind - October 17th, 2025 [October 17th, 2025]
- AI Doesnt Need One-Size-Fits-All Regulation - The University of Chicago Booth School of Business - October 17th, 2025 [October 17th, 2025]
- AI and the Economy - Stanford Institute for Economic Policy Research (SIEPR) - October 17th, 2025 [October 17th, 2025]
- We may be in an AI bubble. What does that mean? - NPR - October 17th, 2025 [October 17th, 2025]
- Shutdown, tariffs, AI or whatever, Utah can weather the storm, says economist - KUER - October 17th, 2025 [October 17th, 2025]
- Humanity AI Commits $500 Million to Build a People-Centered Future for AI - MacArthur Foundation - October 17th, 2025 [October 17th, 2025]
- Ars Live recap: Is the AI bubble about to pop? Ed Zitron weighs in. - Ars Technica - October 17th, 2025 [October 17th, 2025]
- AI Workforce from Hype to Hard Truths: What It Takes to Deliver AI Value | KPMG - BRIAN HEGER - October 17th, 2025 [October 17th, 2025]
- Can Western Digitals (WDC) New AI Lab Transform Its Competitive Edge in Storage Solutions? - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- Jack & Jill raises $20M to bring conversational AI to job-hunting - TechCrunch - October 17th, 2025 [October 17th, 2025]
- Why AI startups are taking data into their own hands - TechCrunch - October 17th, 2025 [October 17th, 2025]
- AI might be creating a permanent underclass but its the makers of the tech bubble who are replaceable | Van Badham - The Guardian - October 17th, 2025 [October 17th, 2025]
- Microsoft wants you to talk to your PC and let AI control it - The Verge - October 17th, 2025 [October 17th, 2025]
- HBS Professor Says AI Can Boost But Not Replace Human Creativity in HAA Webinar - The Harvard Crimson - October 17th, 2025 [October 17th, 2025]
- CEO confidence slips amid economic uncertainty, growing AI and tech concerns - Scripps News - October 17th, 2025 [October 17th, 2025]
- Reflection AI Lands $2 Billion From Nvidia, Eric Schmidt To Build Open Alternative To ChatGPT And Gemini Models - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- Why You Recognize SVU Guest Star Matt Jones, the AI CEO Investigated by Benson's Squad - NBC - October 17th, 2025 [October 17th, 2025]
- BigBear.ai to Report Third Quarter 2025 Results on November 10, 2025 - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- The AI Industrys Scaling Obsession Is Headed for a Cliff - WIRED - October 17th, 2025 [October 17th, 2025]
- How Dells Expanded AI Partnerships and Guidance Shift Will Impact Dell Technologies (DELL) Investors - Yahoo Finance - October 17th, 2025 [October 17th, 2025]
- AI Adoption in Audit is On the Rise - CPA Practice Advisor - October 17th, 2025 [October 17th, 2025]
- How is AI being used to create disturbing images of children? A Philadelphia professor explains. - CBS News - October 17th, 2025 [October 17th, 2025]
- NFL using AI technology during their games - NBC News - October 17th, 2025 [October 17th, 2025]
- From coastal resilience to streamlining product development, Northeastern researchers are the states AI innovators - Northeastern Global News - October 17th, 2025 [October 17th, 2025]
- Trump Jr.-linked firm advertised Treasury conference on AI that the government wasnt in on - MSNBC News - October 17th, 2025 [October 17th, 2025]
- Kentwood police add AI translation to body cameras with 50+ languages, helping to break language barriers - FOX 17 West Michigan News - October 17th, 2025 [October 17th, 2025]
- Stocks may be in an AI bubble. Is it time to horde cash? - USA Today - October 17th, 2025 [October 17th, 2025]
- Baldwin County lawyer reprimanded by federal judge for improper use of AI is appealing - fox10tv.com - October 17th, 2025 [October 17th, 2025]
- Workday Adds to AI Push With $200 Million Investment in Irish Innovation Center - The Wall Street Journal - October 15th, 2025 [October 15th, 2025]
- Investors on guard for risks that could derail the AI gravy train - Reuters - October 15th, 2025 [October 15th, 2025]
- Jensen Huang name-checks 6 AI companies and says 100% of Nvidia engineers use one of them - Business Insider - October 15th, 2025 [October 15th, 2025]
- AI isnt neutral and that should worry us - Medium - October 15th, 2025 [October 15th, 2025]
- Beyond the AI Hype Machine - KQED - October 15th, 2025 [October 15th, 2025]
- Hollywood turns to K Street as AI threatens their livelihoods - Politico - October 15th, 2025 [October 15th, 2025]