In the AI science boom, beware: your results are only as good as your data – Nature.com
Hunter Moseley says that good reproducibility practices are essential to fully harness the potential of big data.Credit: Hunter N.B. Moseley
We are in the middle of a data-driven science boom. Huge, complex data sets, often with large numbers of individually measured and annotated features, are fodder for voracious artificial intelligence (AI) and machine-learning systems, with details of new applications being published almost daily.
But publication in itself is not synonymous with factuality. Just because a paper, method or data set is published does not mean that it is correct and free from mistakes. Without checking for accuracy and validity before using these resources, scientists will surely encounter errors. In fact, they already have.
In the past few months, members of our bioinformatics and systems-biology laboratory have reviewed state-of-the-art machine-learning methods for predicting the metabolic pathways that metabolites belong to, on the basis of the molecules chemical structures1. We wanted to find, implement and potentially improve the best methods for identifying how metabolic pathways are perturbed under different conditions: for instance, in diseased versus normal tissues.
We found several papers, published between 2011 and 2022, that demonstrated the application of different machine-learning methods to a gold-standard metabolite data set derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG), which is maintained at Kyoto University in Japan. We expected the algorithms to improve over time, and saw just that: newer methods performed better than older ones did. But were those improvements real?
Scientific reproducibility enables careful vetting of data and results by peer reviewers as well as by other research groups, especially when the data set is used in new applications. Fortunately, in keeping with best practices for computational reproducibility, two of the papers2,3 in our analysis included everything that is needed to put their observations to the test: the data set they used, the computer code they wrote to implement their methods and the results generated from that code. Three of the papers24 used the same data set, which allowed us to make direct comparisons. When we did so, we found something unexpected.
It is common practice in machine learning to split a data set in two and to use one subset to train a model and another to evaluate its performance. If there is no overlap between the training and testing subsets, performance in the testing phase will reflect how well the model learns and performs. But in the papers we analysed, we identified a catastrophic data leakage problem: the two subsets were cross-contaminated, muddying the ideal separation. More than 1,700 of 6,648 entries from the KEGG COMPOUND database about one-quarter of the total data set were represented more than once, corrupting the cross-validation steps.
NatureTech
When we removed the duplicates in the data set and applied the published methods again, the observed performance was less impressive than it had first seemed. There was a substantial drop in the F1 score a machine-learning evaluation metric that is similar to accuracy but is calculated in terms of precision and recall from 0.94 to 0.82. A score of 0.94 is reasonably high and indicates that the algorithm is usable in many scientific applications. A score of 0.82, however, suggests that it can be useful, but only for certain applications and only if handled appropriately.
It is, of course, unfortunate that these studies were published with flawed results stemming from the corrupted data set; our work calls their findings into question. But because the authors of two of the studies followed best practices in computational scientific reproducibility and made their data, code and results fully available, the scientific method worked as intended, and the flawed results were detected and (to the best of our knowledge) are being corrected.
The third team, as far as we can tell, included neither their data set nor their code, making it impossible for us to properly evaluate their results. If all of the groups had neglected to make their data and code available, this data-leakage problem would have been almost impossible to catch. That would be a problem not just for the studies that were already published, but also for every other scientist who might want to use that data set for their own work.
More insidiously, the erroneously high performance reported in these papers could dissuade others from attempting to improve on the published methods, because they would incorrectly find their own algorithms lacking by comparison. Equally troubling, it could also complicate journal publication, because demonstrating improvement is often a requirement for successful review potentially holding back research for years.
So, what should we do with these erroneous studies? Some would argue that they should be retracted. We would caution against such a knee-jerk reaction at least as a blanket policy. Because two of the three papers in our analysis included the data, code and full results, we could evaluate their findings and flag the problematic data set. On one hand, that behaviour should be encouraged for instance, by allowing the authors to publish corrections. On the other, retracting studies with both highly flawed results and little or no support for reproducible research would send the message that scientific reproducibility is not optional. Furthermore, demonstrating support for full scientific reproducibility provides a clear litmus test for journals to use when deciding between correction and retraction.
Now, scientific data are growing more complex every day. Data sets used in complex analyses, especially those involving AI, are part of the scientific record. They should be made available along with the code with which to analyse them either as supplemental material or through open data repositories, such as Figshare (Figshare has partnered with Springer Nature, which publishes Nature, to facilitate data sharing in published manuscripts) and Zenodo, that can ensure data persistence and provenance. But those steps will help only if researchers also learn to treat published data with some scepticism, if only to avoid repeating others mistakes.
See the original post here:
In the AI science boom, beware: your results are only as good as your data - Nature.com
- IBM Is Back. Now It Must Prove Its Mettle in AI. - WSJ - April 25th, 2025 [April 25th, 2025]
- Googles AI Overviews now reach more than 1.5 billion people every month - The Verge - April 25th, 2025 [April 25th, 2025]
- Alphabet rises as AI bets begin to pay off - Reuters - April 25th, 2025 [April 25th, 2025]
- Microsoft made an ad with generative AI and nobody noticed - The Verge - April 25th, 2025 [April 25th, 2025]
- Apple to Strip Secret Robotics Unit From AI Chief Weeks After Moving Siri - Bloomberg.com - April 25th, 2025 [April 25th, 2025]
- State Bar of California admits it used AI to develop exam questions, triggering new furor - Los Angeles Times - April 25th, 2025 [April 25th, 2025]
- Heres How Big the AI Revolution Really Is, in Four Charts - WSJ - April 25th, 2025 [April 25th, 2025]
- Update: Meta AI Begins Roll Out on Ray-Ban Meta Glasses to Even More Countries in the EU - Meta | Social Metaverse Company - April 25th, 2025 [April 25th, 2025]
- Adobe Revolutionizes AI-Assisted Creativity with Firefly, the All-In-One Home for AI Content Creation, with New Partner and Firefly Models - Adobe... - April 25th, 2025 [April 25th, 2025]
- Unveiling GPT-image-1: Rising to new heights with image generation in Azure AI Foundry - Microsoft Azure - April 25th, 2025 [April 25th, 2025]
- AI Is Spreading Old Stereotypes to New Languages and Cultures - WIRED - April 25th, 2025 [April 25th, 2025]
- In the age of AI, we must protect human creativity as a natural resource - Ars Technica - April 25th, 2025 [April 25th, 2025]
- Spotify Expands AI Playlist in Beta to Premium Listeners in 40+ New Markets - Spotify For the Record - April 25th, 2025 [April 25th, 2025]
- Microsoft says everyone will be a boss in the future of AI employees - The Guardian - April 25th, 2025 [April 25th, 2025]
- Student loans are back, US travel is whack, and, AI, please, step back : The Indicator from Planet Money - NPR - April 25th, 2025 [April 25th, 2025]
- How real-world businesses are transforming with AI with 261 new stories - The Official Microsoft Blog - April 25th, 2025 [April 25th, 2025]
- This Texas mom made $8,000 in 3 weeks training AI at her kitchen table. She says it's 'not easy money.' - Business Insider - April 25th, 2025 [April 25th, 2025]
- Dataminr Announces $100M Investment from Fortress to Accelerate Gen AI and Agentic AI Product Innovation, and to Expand its Reach to Enterprises &... - April 25th, 2025 [April 25th, 2025]
- Pony.ai teams up with Tencent for robotaxi services on WeChat, other apps - CNBC - April 25th, 2025 [April 25th, 2025]
- Alarming rise in AI-powered scams: Microsoft reveals $4 Billion in thwarted fraud - AI News - April 25th, 2025 [April 25th, 2025]
- CalArts, Chanel Launch Center for Artists and Tech With AI Focus - Variety - April 25th, 2025 [April 25th, 2025]
- China isnt trying to win the AI race - Financial Times - April 25th, 2025 [April 25th, 2025]
- WhatsApp defends 'optional' AI tool that cannot be turned off - BBC - April 25th, 2025 [April 25th, 2025]
- Nvidia Thinks It Has a Better Way of Building AI Agents - WSJ - April 25th, 2025 [April 25th, 2025]
- AI was used to write the California bar exam. The law community is outraged. - Mashable - April 25th, 2025 [April 25th, 2025]
- Exclusive: Anthropic warns fully AI employees are a year away - Axios - April 25th, 2025 [April 25th, 2025]
- Should You Forget Nvidia and Buy These 2 Millionaire-Maker AI Stocks Instead? - The Motley Fool - April 25th, 2025 [April 25th, 2025]
- Opinion: Art is a form of communication between human beings. AI wont change that - The Globe and Mail - April 25th, 2025 [April 25th, 2025]
- Adobe Firefly: The next evolution of creative AI is here - Adobe - April 25th, 2025 [April 25th, 2025]
- Adobe to launch mobile app for AI image generation tool as OpenAI steps up rivalry - CNBC - April 25th, 2025 [April 25th, 2025]
- Humanoid workers and surveillance buggies: embodied AI is reshaping daily life in China - The Guardian - April 21st, 2025 [April 21st, 2025]
- TSMC Warns of Limits of Ability to Keep Its AI Chips From China - Bloomberg.com - April 21st, 2025 [April 21st, 2025]
- A customer support AI went rogueand its a warning for every company considering replacing workers with automation - Fortune - April 21st, 2025 [April 21st, 2025]
- Could AI text alerts help save snow leopards from extinction? - BBC - April 21st, 2025 [April 21st, 2025]
- The #1 Skill That Pays More Than Gen AI In 2025 - Forbes - April 21st, 2025 [April 21st, 2025]
- 1 Artificial Intelligence (AI) Stock-Buyback Stock to Buy Hand Over Fist During the Nasdaq Sell-Off - Yahoo Finance - April 21st, 2025 [April 21st, 2025]
- What America Gets Wrong About the AI Race - Foreign Affairs - April 21st, 2025 [April 21st, 2025]
- Use AI as a tool for growth instead of degradation with this strategy. - Psychology Today - April 21st, 2025 [April 21st, 2025]
- Investor Says AI Is Already "Fully Replacing People" - futurism.com - April 21st, 2025 [April 21st, 2025]
- The philosophers machine: my conversation with Peter Singers AI chatbot - The Guardian - April 21st, 2025 [April 21st, 2025]
- With AI slop distorting our reality, the world is sleepwalking into disaster | Nesrine Malik - The Guardian - April 21st, 2025 [April 21st, 2025]
- Viral AI-made art trends are making artists even more worried about their futures - NBC News - April 21st, 2025 [April 21st, 2025]
- OpenAIs o3 AI model scores lower on a benchmark than the company initially implied - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Artists push back against Barbie-like AI dolls with their own creations - BBC - April 21st, 2025 [April 21st, 2025]
- If you use AI to write me that note, dont expect me to read it - Fast Company - April 21st, 2025 [April 21st, 2025]
- Companies can leverage the true value of meetings with AI by building an LLM for Leadership - GeekWire - April 21st, 2025 [April 21st, 2025]
- Using tech, AI to make construction jobs appeal to women - DW - April 21st, 2025 [April 21st, 2025]
- Famed AI researcher launches controversial startup to replace all human workers everywhere - TechCrunch - April 21st, 2025 [April 21st, 2025]
- Impersonal assistant: This vehicle AI drove me to distraction - Detroit Free Press - April 21st, 2025 [April 21st, 2025]
- A 30-year-old AI founder who followed the FIRE movement to build wealth is now the youngest self-made woman billionaire - Fortune - April 21st, 2025 [April 21st, 2025]
- Musk and AI among biggest threats to brand reputation, global survey shows - The Guardian - April 21st, 2025 [April 21st, 2025]
- Stable Diffusion Now Optimized for AMD Radeon GPUs and Ryzen AI APUs - Stability AI - April 21st, 2025 [April 21st, 2025]
- Wikipedia is giving AI developers its data to fend off bot scrapers - The Verge - April 21st, 2025 [April 21st, 2025]
- The Healthcare AI Adoption Index - Bessemer Venture Partners - April 21st, 2025 [April 21st, 2025]
- Italian opposition file complaint over far-right partys use of racist AI images - The Guardian - April 21st, 2025 [April 21st, 2025]
- Meta's chief AI scientist calls French initiative to attract US scientists a 'smart move' - Business Insider - April 21st, 2025 [April 21st, 2025]
- Huawei introduces the Ascend 920 AI chip to fill the void left by Nvidia's H20 - Tom's Hardware - April 21st, 2025 [April 21st, 2025]
- I started vibe coding my own apps with AI. Im absolutely loving it - pcworld.com - April 21st, 2025 [April 21st, 2025]
- Living With the Galaxy S25 Ultra: Samsung's AI Shines in This Year's Model - PCMag - April 21st, 2025 [April 21st, 2025]
- o3 and o4-mini: Unlock enterprise agent workflows with next-level reasoning AI with Azure AI Foundry and GitHub - Microsoft Azure - April 18th, 2025 [April 18th, 2025]
- AI-generated music accounts for 18% of all tracks uploaded to Deezer - Reuters - April 18th, 2025 [April 18th, 2025]
- This Incredibly Cheap Artificial Intelligence (AI) Stock Is a Terrific Bargain Right Now - The Motley Fool - April 18th, 2025 [April 18th, 2025]
- Trump, Braun executive orders seek to revive fossil fuels. AI is one reason - IndyStar - April 18th, 2025 [April 18th, 2025]
- AI is coming for music, too - MIT Technology Review - April 18th, 2025 [April 18th, 2025]
- AI Reveals What Keeps People Committed to Exercise - Neuroscience News - April 18th, 2025 [April 18th, 2025]
- CEO reorganizes Intel with new CTO and AI lead - Tom's Hardware - April 18th, 2025 [April 18th, 2025]
- Netflix is revamping search with AI to improve discovery - TechCrunch - April 18th, 2025 [April 18th, 2025]
- Can this $70,000 robot transform AI research? - Fox News - April 18th, 2025 [April 18th, 2025]
- How This AI Tool Simplifies the Renting Process - CNET - April 18th, 2025 [April 18th, 2025]
- What to know before using AI to turn yourself into a Barbie doll or action figure - FOX 13 Tampa Bay - April 18th, 2025 [April 18th, 2025]
- YouTube Looks to Creators (and Their Data) to Win in the AI Era - Bloomberg.com - April 18th, 2025 [April 18th, 2025]
- 7 Goldman Sachs insiders explain how the bank's new AI sidekick is helping them crush it at work - Business Insider - April 18th, 2025 [April 18th, 2025]
- Ted Sarandos: The Bigger Opportunity with AI in Filmmaking Is If You Can Make Movies 10% Better, Not Just Cheaper - IndieWire - April 18th, 2025 [April 18th, 2025]
- Figuring out which AI model is right for you is harder than you think - Business Insider - April 18th, 2025 [April 18th, 2025]
- The humble screenshot might be the key to great AI assistants - The Verge - April 18th, 2025 [April 18th, 2025]
- How AI is using facial recognition to help bring lost pets home - CBS News - April 18th, 2025 [April 18th, 2025]
- Announcing the AWS Well-Architected Generative AI Lens - Amazon Web Services - April 18th, 2025 [April 18th, 2025]
- Gen Z can earn $70,000 a year and enter the AI-proof medical field without a college degreeall they have to do is learn how to sterilize surgical... - April 18th, 2025 [April 18th, 2025]
- This College Protester Isnt Real. Its an AI-Powered Undercover Bot for Cops - WIRED - April 18th, 2025 [April 18th, 2025]
- Intel will need license to export AI chips to Chinese clients, FT reports - Reuters - April 18th, 2025 [April 18th, 2025]