NIH findings shed light on risks and benefits of integrating AI into medical decision-making – National Institutes of Health (NIH) (.gov)
News Release
Tuesday, July 23, 2024
AI model scored well on medical diagnostic quiz, but made mistakes explaining answers.
Researchers at the National Institutes of Health (NIH) found that an artificial intelligence (AI) model solved medical quiz questionsdesigned to test health professionals ability to diagnose patients based on clinical images and a brief text summarywith high accuracy. However, physician-graders found the AI model made mistakes when describing images and explaining how its decision-making led to the correct answer. The findings, which shed light on AIs potential in the clinical setting, were published in npj Digital Medicine. The study was led by researchers from NIHs National Library of Medicine (NLM) and Weill Cornell Medicine, New York City.
Integration of AI into health care holds great promise as a tool to help medical professionals diagnose patients faster, allowing them to start treatment sooner, said NLM Acting Director, Stephen Sherry, Ph.D. However, as this study shows, AI is not advanced enough yet to replace human experience, which is crucial for accurate diagnosis.
The AI model and human physicians answered questions from the New England Journal of Medicine (NEJM)s Image Challenge. The challenge is an online quiz that provides real clinical images and a short text description that includes details about the patients symptoms and presentation, then asks users to choose the correct diagnosis from multiple-choice answers.
The researchers tasked the AI model to answer 207 image challenge questions and provide a written rationale to justify each answer. The prompt specified that the rationale should include a description of the image, a summary of relevant medical knowledge, and provide step-by-step reasoning for how the model chose the answer.
Nine physicians from various institutions were recruited, each with a different medical specialty, and answered their assigned questions first in a closed-book setting, (without referring to any external materials such as online resources) and then in an open-book setting (using external resources). The researchers then provided the physicians with the correct answer, along with the AI models answer and corresponding rationale. Finally, the physicians were asked to score the AI models ability to describe the image, summarize relevant medical knowledge, and provide its step-by-step reasoning.
The researchers found that the AI model and physicians scored highly in selecting the correct diagnosis. Interestingly, the AI model selected the correct diagnosis more often than physicians in closed-book settings, while physicians with open-book tools performed better than the AI model, especially when answering the questions ranked most difficult.
Importantly, based on physician evaluations, the AI model often made mistakes when describing the medical image and explaining its reasoning behind the diagnosis even in cases where it made the correct final choice. In one example, the AI model was provided with a photo of a patients arm with two lesions. A physician would easily recognize that both lesions were caused by the same condition. However, because the lesions were presented at different angles causing the illusion of different colors and shapes the AI model failed to recognize that both lesions could be related to the same diagnosis.
The researchers argue that these findings underpin the importance of evaluating multi-modal AI technology further before introducing it into the clinical setting.
This technology has the potential to help clinicians augment their capabilities with data-driven insights that may lead to improved clinical decision-making, said NLM Senior Investigator and corresponding author of the study, Zhiyong Lu, Ph.D. Understanding the risks and limitations of this technology is essential to harnessing its potential in medicine.
The study used an AI model known as GPT-4V (Generative Pre-trained Transformer 4 with Vision), which is a multimodal AI model that can process combinations of multiple types of data, including text and images. The researchers note that while this is a small study, it sheds light on multi-modal AIs potential to aid physicians medical decision-making. More research is needed to understand how such models compare to physicians ability to diagnose patients.
The study was co-authored by collaborators from NIHs National Eye Institute and the NIH Clinical Center; the University of Pittsburgh; UT Southwestern Medical Center, Dallas; New York University Grossman School of Medicine, New York City; Harvard Medical School and Massachusetts General Hospital, Boston; Case Western Reserve University School of Medicine, Cleveland; University of California San Diego, La Jolla; and the University of Arkansas, Little Rock.
The National Library of Medicine (NLM) is a leader in research in biomedical informatics and data science and the worlds largest biomedical library. NLM conducts and supports research in methods for recording, storing, retrieving, preserving, and communicating health information. NLM creates resources and tools that are used billions of times each year by millions of people to access and analyze molecular biology, biotechnology, toxicology, environmental health, and health services information. Additional information is available at https://www.nlm.nih.gov.
About the National Institutes of Health (NIH): NIH, the nation's medical research agency, includes 27 Institutes and Centers and is a component of the U.S. Department of Health and Human Services. NIH is the primary federal agency conducting and supporting basic, clinical, and translational medical research, and is investigating the causes, treatments, and cures for both common and rare diseases. For more information about NIH and its programs, visit http://www.nih.gov.
NIHTurning Discovery Into Health
Qiao Jin, et al. Hidden Flaws Behind Expert-Level Accuracy of Multimodal GPT-4 Vision in Medicine. npj Digital Medicine. DOI: 10.1038/s41746-024-01185-7 (2024).
###
See the article here:
- IBM Is Back. Now It Must Prove Its Mettle in AI. - WSJ - April 25th, 2025 [April 25th, 2025]
- Googles AI Overviews now reach more than 1.5 billion people every month - The Verge - April 25th, 2025 [April 25th, 2025]
- Alphabet rises as AI bets begin to pay off - Reuters - April 25th, 2025 [April 25th, 2025]
- Microsoft made an ad with generative AI and nobody noticed - The Verge - April 25th, 2025 [April 25th, 2025]
- Apple to Strip Secret Robotics Unit From AI Chief Weeks After Moving Siri - Bloomberg.com - April 25th, 2025 [April 25th, 2025]
- State Bar of California admits it used AI to develop exam questions, triggering new furor - Los Angeles Times - April 25th, 2025 [April 25th, 2025]
- Heres How Big the AI Revolution Really Is, in Four Charts - WSJ - April 25th, 2025 [April 25th, 2025]
- Update: Meta AI Begins Roll Out on Ray-Ban Meta Glasses to Even More Countries in the EU - Meta | Social Metaverse Company - April 25th, 2025 [April 25th, 2025]
- Adobe Revolutionizes AI-Assisted Creativity with Firefly, the All-In-One Home for AI Content Creation, with New Partner and Firefly Models - Adobe... - April 25th, 2025 [April 25th, 2025]
- Unveiling GPT-image-1: Rising to new heights with image generation in Azure AI Foundry - Microsoft Azure - April 25th, 2025 [April 25th, 2025]
- AI Is Spreading Old Stereotypes to New Languages and Cultures - WIRED - April 25th, 2025 [April 25th, 2025]
- In the age of AI, we must protect human creativity as a natural resource - Ars Technica - April 25th, 2025 [April 25th, 2025]
- Spotify Expands AI Playlist in Beta to Premium Listeners in 40+ New Markets - Spotify For the Record - April 25th, 2025 [April 25th, 2025]
- Microsoft says everyone will be a boss in the future of AI employees - The Guardian - April 25th, 2025 [April 25th, 2025]
- Student loans are back, US travel is whack, and, AI, please, step back : The Indicator from Planet Money - NPR - April 25th, 2025 [April 25th, 2025]
- How real-world businesses are transforming with AI with 261 new stories - The Official Microsoft Blog - April 25th, 2025 [April 25th, 2025]
- This Texas mom made $8,000 in 3 weeks training AI at her kitchen table. She says it's 'not easy money.' - Business Insider - April 25th, 2025 [April 25th, 2025]
- Dataminr Announces $100M Investment from Fortress to Accelerate Gen AI and Agentic AI Product Innovation, and to Expand its Reach to Enterprises &... - April 25th, 2025 [April 25th, 2025]
- Pony.ai teams up with Tencent for robotaxi services on WeChat, other apps - CNBC - April 25th, 2025 [April 25th, 2025]
- Alarming rise in AI-powered scams: Microsoft reveals $4 Billion in thwarted fraud - AI News - April 25th, 2025 [April 25th, 2025]
- CalArts, Chanel Launch Center for Artists and Tech With AI Focus - Variety - April 25th, 2025 [April 25th, 2025]
- China isnt trying to win the AI race - Financial Times - April 25th, 2025 [April 25th, 2025]
- WhatsApp defends 'optional' AI tool that cannot be turned off - BBC - April 25th, 2025 [April 25th, 2025]
- Nvidia Thinks It Has a Better Way of Building AI Agents - WSJ - April 25th, 2025 [April 25th, 2025]
- AI was used to write the California bar exam. The law community is outraged. - Mashable - April 25th, 2025 [April 25th, 2025]
- Exclusive: Anthropic warns fully AI employees are a year away - Axios - April 25th, 2025 [April 25th, 2025]
- Should You Forget Nvidia and Buy These 2 Millionaire-Maker AI Stocks Instead? - The Motley Fool - April 25th, 2025 [April 25th, 2025]
- Opinion: Art is a form of communication between human beings. AI wont change that - The Globe and Mail - April 25th, 2025 [April 25th, 2025]
- Adobe Firefly: The next evolution of creative AI is here - Adobe - April 25th, 2025 [April 25th, 2025]
- Adobe to launch mobile app for AI image generation tool as OpenAI steps up rivalry - CNBC - April 25th, 2025 [April 25th, 2025]
- Humanoid workers and surveillance buggies: embodied AI is reshaping daily life in China - The Guardian - April 21st, 2025 [April 21st, 2025]
- TSMC Warns of Limits of Ability to Keep Its AI Chips From China - Bloomberg.com - April 21st, 2025 [April 21st, 2025]
- A customer support AI went rogueand its a warning for every company considering replacing workers with automation - Fortune - April 21st, 2025 [April 21st, 2025]
- Could AI text alerts help save snow leopards from extinction? - BBC - April 21st, 2025 [April 21st, 2025]
- The #1 Skill That Pays More Than Gen AI In 2025 - Forbes - April 21st, 2025 [April 21st, 2025]
- 1 Artificial Intelligence (AI) Stock-Buyback Stock to Buy Hand Over Fist During the Nasdaq Sell-Off - Yahoo Finance - April 21st, 2025 [April 21st, 2025]
- What America Gets Wrong About the AI Race - Foreign Affairs - April 21st, 2025 [April 21st, 2025]
- Use AI as a tool for growth instead of degradation with this strategy. - Psychology Today - April 21st, 2025 [April 21st, 2025]
- Investor Says AI Is Already "Fully Replacing People" - futurism.com - April 21st, 2025 [April 21st, 2025]
- The philosophers machine: my conversation with Peter Singers AI chatbot - The Guardian - April 21st, 2025 [April 21st, 2025]
- With AI slop distorting our reality, the world is sleepwalking into disaster | Nesrine Malik - The Guardian - April 21st, 2025 [April 21st, 2025]
- Viral AI-made art trends are making artists even more worried about their futures - NBC News - April 21st, 2025 [April 21st, 2025]
- OpenAIs o3 AI model scores lower on a benchmark than the company initially implied - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Artists push back against Barbie-like AI dolls with their own creations - BBC - April 21st, 2025 [April 21st, 2025]
- If you use AI to write me that note, dont expect me to read it - Fast Company - April 21st, 2025 [April 21st, 2025]
- Companies can leverage the true value of meetings with AI by building an LLM for Leadership - GeekWire - April 21st, 2025 [April 21st, 2025]
- Using tech, AI to make construction jobs appeal to women - DW - April 21st, 2025 [April 21st, 2025]
- Famed AI researcher launches controversial startup to replace all human workers everywhere - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Impersonal assistant: This vehicle AI drove me to distraction - Detroit Free Press - April 21st, 2025 [April 21st, 2025]
- A 30-year-old AI founder who followed the FIRE movement to build wealth is now the youngest self-made woman billionaire - Fortune - April 21st, 2025 [April 21st, 2025]
- Musk and AI among biggest threats to brand reputation, global survey shows - The Guardian - April 21st, 2025 [April 21st, 2025]
- Stable Diffusion Now Optimized for AMD Radeon GPUs and Ryzen AI APUs - Stability AI - April 21st, 2025 [April 21st, 2025]
- Wikipedia is giving AI developers its data to fend off bot scrapers - The Verge - April 21st, 2025 [April 21st, 2025]
- The Healthcare AI Adoption Index - Bessemer Venture Partners - April 21st, 2025 [April 21st, 2025]
- Italian opposition file complaint over far-right partys use of racist AI images - The Guardian - April 21st, 2025 [April 21st, 2025]
- Meta's chief AI scientist calls French initiative to attract US scientists a 'smart move' - Business Insider - April 21st, 2025 [April 21st, 2025]
- Huawei introduces the Ascend 920 AI chip to fill the void left by Nvidia's H20 - Tom's Hardware - April 21st, 2025 [April 21st, 2025]
- I started vibe coding my own apps with AI. Im absolutely loving it - pcworld.com - April 21st, 2025 [April 21st, 2025]
- Living With the Galaxy S25 Ultra: Samsung's AI Shines in This Year's Model - PCMag - April 21st, 2025 [April 21st, 2025]
- o3 and o4-mini: Unlock enterprise agent workflows with next-level reasoning AI with Azure AI Foundry and GitHub - Microsoft Azure - April 18th, 2025 [April 18th, 2025]
- AI-generated music accounts for 18% of all tracks uploaded to Deezer - Reuters - April 18th, 2025 [April 18th, 2025]
- This Incredibly Cheap Artificial Intelligence (AI) Stock Is a Terrific Bargain Right Now - The Motley Fool - April 18th, 2025 [April 18th, 2025]
- Trump, Braun executive orders seek to revive fossil fuels. AI is one reason - IndyStar - April 18th, 2025 [April 18th, 2025]
- AI is coming for music, too - MIT Technology Review - April 18th, 2025 [April 18th, 2025]
- AI Reveals What Keeps People Committed to Exercise - Neuroscience News - April 18th, 2025 [April 18th, 2025]
- CEO reorganizes Intel with new CTO and AI lead - Tom's Hardware - April 18th, 2025 [April 18th, 2025]
- Netflix is revamping search with AI to improve discovery - TechCrunch - April 18th, 2025 [April 18th, 2025]
- Can this $70,000 robot transform AI research? - Fox News - April 18th, 2025 [April 18th, 2025]
- How This AI Tool Simplifies the Renting Process - CNET - April 18th, 2025 [April 18th, 2025]
- What to know before using AI to turn yourself into a Barbie doll or action figure - FOX 13 Tampa Bay - April 18th, 2025 [April 18th, 2025]
- YouTube Looks to Creators (and Their Data) to Win in the AI Era - Bloomberg.com - April 18th, 2025 [April 18th, 2025]
- 7 Goldman Sachs insiders explain how the bank's new AI sidekick is helping them crush it at work - Business Insider - April 18th, 2025 [April 18th, 2025]
- Ted Sarandos: The Bigger Opportunity with AI in Filmmaking Is If You Can Make Movies 10% Better, Not Just Cheaper - IndieWire - April 18th, 2025 [April 18th, 2025]
- Figuring out which AI model is right for you is harder than you think - Business Insider - April 18th, 2025 [April 18th, 2025]
- The humble screenshot might be the key to great AI assistants - The Verge - April 18th, 2025 [April 18th, 2025]
- How AI is using facial recognition to help bring lost pets home - CBS News - April 18th, 2025 [April 18th, 2025]
- Announcing the AWS Well-Architected Generative AI Lens - Amazon Web Services - April 18th, 2025 [April 18th, 2025]
- Gen Z can earn $70,000 a year and enter the AI-proof medical field without a college degreeall they have to do is learn how to sterilize surgical... - April 18th, 2025 [April 18th, 2025]
- This College Protester Isnt Real. Its an AI-Powered Undercover Bot for Cops - WIRED - April 18th, 2025 [April 18th, 2025]
- Intel will need license to export AI chips to Chinese clients, FT reports - Reuters - April 18th, 2025 [April 18th, 2025]