In the AI science boom, beware: your results are only as good as your data – Nature.com
Hunter Moseley says that good reproducibility practices are essential to fully harness the potential of big data.Credit: Hunter N.B. Moseley
We are in the middle of a data-driven science boom. Huge, complex data sets, often with large numbers of individually measured and annotated features, are fodder for voracious artificial intelligence (AI) and machine-learning systems, with details of new applications being published almost daily.
But publication in itself is not synonymous with factuality. Just because a paper, method or data set is published does not mean that it is correct and free from mistakes. Without checking for accuracy and validity before using these resources, scientists will surely encounter errors. In fact, they already have.
In the past few months, members of our bioinformatics and systems-biology laboratory have reviewed state-of-the-art machine-learning methods for predicting the metabolic pathways that metabolites belong to, on the basis of the molecules chemical structures1. We wanted to find, implement and potentially improve the best methods for identifying how metabolic pathways are perturbed under different conditions: for instance, in diseased versus normal tissues.
We found several papers, published between 2011 and 2022, that demonstrated the application of different machine-learning methods to a gold-standard metabolite data set derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG), which is maintained at Kyoto University in Japan. We expected the algorithms to improve over time, and saw just that: newer methods performed better than older ones did. But were those improvements real?
Scientific reproducibility enables careful vetting of data and results by peer reviewers as well as by other research groups, especially when the data set is used in new applications. Fortunately, in keeping with best practices for computational reproducibility, two of the papers2,3 in our analysis included everything that is needed to put their observations to the test: the data set they used, the computer code they wrote to implement their methods and the results generated from that code. Three of the papers24 used the same data set, which allowed us to make direct comparisons. When we did so, we found something unexpected.
It is common practice in machine learning to split a data set in two and to use one subset to train a model and another to evaluate its performance. If there is no overlap between the training and testing subsets, performance in the testing phase will reflect how well the model learns and performs. But in the papers we analysed, we identified a catastrophic data leakage problem: the two subsets were cross-contaminated, muddying the ideal separation. More than 1,700 of 6,648 entries from the KEGG COMPOUND database about one-quarter of the total data set were represented more than once, corrupting the cross-validation steps.
NatureTech
When we removed the duplicates in the data set and applied the published methods again, the observed performance was less impressive than it had first seemed. There was a substantial drop in the F1 score a machine-learning evaluation metric that is similar to accuracy but is calculated in terms of precision and recall from 0.94 to 0.82. A score of 0.94 is reasonably high and indicates that the algorithm is usable in many scientific applications. A score of 0.82, however, suggests that it can be useful, but only for certain applications and only if handled appropriately.
It is, of course, unfortunate that these studies were published with flawed results stemming from the corrupted data set; our work calls their findings into question. But because the authors of two of the studies followed best practices in computational scientific reproducibility and made their data, code and results fully available, the scientific method worked as intended, and the flawed results were detected and (to the best of our knowledge) are being corrected.
The third team, as far as we can tell, included neither their data set nor their code, making it impossible for us to properly evaluate their results. If all of the groups had neglected to make their data and code available, this data-leakage problem would have been almost impossible to catch. That would be a problem not just for the studies that were already published, but also for every other scientist who might want to use that data set for their own work.
More insidiously, the erroneously high performance reported in these papers could dissuade others from attempting to improve on the published methods, because they would incorrectly find their own algorithms lacking by comparison. Equally troubling, it could also complicate journal publication, because demonstrating improvement is often a requirement for successful review potentially holding back research for years.
So, what should we do with these erroneous studies? Some would argue that they should be retracted. We would caution against such a knee-jerk reaction at least as a blanket policy. Because two of the three papers in our analysis included the data, code and full results, we could evaluate their findings and flag the problematic data set. On one hand, that behaviour should be encouraged for instance, by allowing the authors to publish corrections. On the other, retracting studies with both highly flawed results and little or no support for reproducible research would send the message that scientific reproducibility is not optional. Furthermore, demonstrating support for full scientific reproducibility provides a clear litmus test for journals to use when deciding between correction and retraction.
Now, scientific data are growing more complex every day. Data sets used in complex analyses, especially those involving AI, are part of the scientific record. They should be made available along with the code with which to analyse them either as supplemental material or through open data repositories, such as Figshare (Figshare has partnered with Springer Nature, which publishes Nature, to facilitate data sharing in published manuscripts) and Zenodo, that can ensure data persistence and provenance. But those steps will help only if researchers also learn to treat published data with some scepticism, if only to avoid repeating others mistakes.
See the original post here:
In the AI science boom, beware: your results are only as good as your data - Nature.com
- Wall Street steadies after its AI-induced sell-off - JHNewsAndGuide.com - February 14th, 2026 [February 14th, 2026]
- Marvell Expands AI And Cloud Role With Celestial AI And XConn Deals - Yahoo Finance - February 14th, 2026 [February 14th, 2026]
- America Isnt Ready for What AI Will Do to Jobs - The Ringer - February 14th, 2026 [February 14th, 2026]
- Bend parents protested an AI chatbot. A tech company shelved it, then school leaders defended it - Oregon Public Broadcasting - OPB - February 14th, 2026 [February 14th, 2026]
- In the hands of innovators from the Global South, AI can transform lives - UN News - February 14th, 2026 [February 14th, 2026]
- IBM plans to triple entry-level hiring this year because of AI - Axios - February 14th, 2026 [February 14th, 2026]
- Young people turning to AI for mental health guidance before seeking professional help - WBAY - February 14th, 2026 [February 14th, 2026]
- AI being used in Romance Scams. Here's what to know - NBC Los Angeles - February 14th, 2026 [February 14th, 2026]
- To stop chronic absenteeism, these schools will try AI - Times Union - February 14th, 2026 [February 14th, 2026]
- IBM is tripling the number of Gen Z entry-level jobs after finding the limits of AI adoption - Fortune - February 14th, 2026 [February 14th, 2026]
- Pursuing Personal Happiness Is Getting Boosted Via Timely Advice From Generative AI Such As ChatGPT - Forbes - February 14th, 2026 [February 14th, 2026]
- Stocks Plunge on Tech Weakness and AI Fears - Nasdaq - February 14th, 2026 [February 14th, 2026]
- Can the AI Sector Become the Key to BAIC New Energy's Breakthrough? - Gasgoo - February 14th, 2026 [February 14th, 2026]
- Healey adding AI to government operations - Sentinel and Enterprise - February 14th, 2026 [February 14th, 2026]
- What kind of AI disruption victim are you? - Financial Times - February 14th, 2026 [February 14th, 2026]
- U.S. stocks halt their AI-induced slide and rise after an encouraging update on inflation - mariettatimes.com - February 14th, 2026 [February 14th, 2026]
- State lawmakers, therapists sound the alarm on AI therapists - Live 5 News - February 14th, 2026 [February 14th, 2026]
- The existential AI threat is here and some AI leaders are fleeing - Axios - February 14th, 2026 [February 14th, 2026]
- Looking for the perfect Valentine? Some are turning to AI - CBS News - February 14th, 2026 [February 14th, 2026]
- This Is the Best AI Stock to Buy in February 2026, According to Wall Street - The Motley Fool - February 14th, 2026 [February 14th, 2026]
- Claude, the ChatGPT rival shaking up AI and software: What it is and why it matters - EL PAS English - February 14th, 2026 [February 14th, 2026]
- I used an AI-powered app to lose 70 pounds. I reversed my diabetes and can keep up with my 8-year-old. - Business Insider - February 14th, 2026 [February 14th, 2026]
- Local teens use AI to talk about abusive relationships, panelists say - Savannah Morning News - February 14th, 2026 [February 14th, 2026]
- A Look At DocuSign (DOCU) Valuation After New AI eSignature Features And Rising Earnings Optimism - Yahoo Finance - February 14th, 2026 [February 14th, 2026]
- The AI future is now, and markets are reacting differently in the US and China - South China Morning Post - February 14th, 2026 [February 14th, 2026]
- David Carlson: What should we think about AI? - The Daily Reporter - Greenfield Indiana - February 14th, 2026 [February 14th, 2026]
- All-in on AI: what TikTok creator ByteDance did next - KTEN - February 14th, 2026 [February 14th, 2026]
- Trucking and real estate stocks struggle to gain momentum on Friday after becoming latest victims of AI fears - CNBC - February 14th, 2026 [February 14th, 2026]
- AI will likely shut down critical infrastructure on its own, no attackers required - Computerworld - February 14th, 2026 [February 14th, 2026]
- Why you should beware of ChatGPTs AI caricature trend - Euronews.com - February 14th, 2026 [February 14th, 2026]
- Military AI Adoption Is Outpacing Global Cooperation - Council on Foreign Relations - February 11th, 2026 [February 11th, 2026]
- CBP Signs Clearview AI Deal to Use Face Recognition for Tactical Targeting - WIRED - February 11th, 2026 [February 11th, 2026]
- Googles Nobel-winning AI leader sees a renaissance aheadafter a 10- or 15-year shakeout - Fortune - February 11th, 2026 [February 11th, 2026]
- Heineken to slash up to 6,000 jobs in AI 'productivity savings' amid slump in beer sales - CNBC - February 11th, 2026 [February 11th, 2026]
- ExGoogle exec says degrees in law and medicine are a waste of time because they take so long to complete that AI will catch up by graduation - Fortune - February 11th, 2026 [February 11th, 2026]
- AI CEO warns AI's disruption will be 'much bigger' than COVID: 'The people I care about deserve to hear what is coming' - Business Insider - February 11th, 2026 [February 11th, 2026]
- Guest column: Super Bowl ads predicts the end of the AI bubble - WRAL - February 11th, 2026 [February 11th, 2026]
- Is your campaign structure holding you back in the era of AI? - blog.google - February 11th, 2026 [February 11th, 2026]
- Harmony Korine Avoids Books, Doesnt See Movies and Thinks AI Is the Art Form That Holds the Most Promise - The Hollywood Reporter - February 11th, 2026 [February 11th, 2026]
- Middle Tennessee police agencies say they dont use AI to search for suspects, solve crimes - WSMV - February 11th, 2026 [February 11th, 2026]
- Student claims U-M wrongly accused her of using AI - Detroit Free Press - February 11th, 2026 [February 11th, 2026]
- New AI tool helps scientists see how cells work together inside diseased tissue - Medical Xpress - February 11th, 2026 [February 11th, 2026]
- How Palantir and AI money is shaping the midterms - CNN - February 11th, 2026 [February 11th, 2026]
- The big AI job swap: why white-collar workers are ditching their careers - The Guardian - February 11th, 2026 [February 11th, 2026]
- 5 takeaways from new state government report on AI - City & State Pennsylvania - February 11th, 2026 [February 11th, 2026]
- The Power of Luminar Neo's Newest AI Tools Put to the Test - Fstoppers - February 11th, 2026 [February 11th, 2026]
- Most people are using AI wrong; heres how to deploy it like a super user - AZ Family - February 11th, 2026 [February 11th, 2026]
- What to know about 'Generation AI,' a new show on Arizona's Family+ - AZ Family - February 11th, 2026 [February 11th, 2026]
- America Isnt Ready for What AI Will Do to Jobs - The Atlantic - February 11th, 2026 [February 11th, 2026]
- I plan to study computer science even though people say it's being replaced by AI. Here's why. - Business Insider - February 11th, 2026 [February 11th, 2026]
- Uber Eats launches AI assistant to help with grocery cart creation - TechCrunch - February 11th, 2026 [February 11th, 2026]
- New Marriott and Hilton Filings Reveal Risks From AI Platforms to Direct Bookings - Skift - February 11th, 2026 [February 11th, 2026]
- Cisco raises annual resulsts forecast fueled by AI demand - Reuters - February 11th, 2026 [February 11th, 2026]
- The AI industry has a big Chicken Little problem - Mashable - February 11th, 2026 [February 11th, 2026]
- Forget SoundHound AI: This Enterprise AI Stock Is Turning Government Contracts Into a Cash Machine - The Motley Fool - February 11th, 2026 [February 11th, 2026]
- AbbVie AI expert to headline USI Romain Market Makers event - University of Southern Indiana | USI - February 11th, 2026 [February 11th, 2026]
- Prediction: This Will Be the Best AI Stock to Own for the Next 5 Years - The Motley Fool - February 11th, 2026 [February 11th, 2026]
- Kodiak AI lands Marine Corps deal to add driverless tech to ROGUE Fires platform - Breaking Defense - February 11th, 2026 [February 11th, 2026]
- What Apples AI deal with Google means for the two tech giants, and for $500 billion upstart OpenAI - Fortune - January 14th, 2026 [January 14th, 2026]
- Whats Expensive in AI? The Answer is Changing Fast. - SaaStr - January 14th, 2026 [January 14th, 2026]
- Four Ways I Use AI as a Principal (and One Way I Never Will) (Opinion) - Education Week - January 14th, 2026 [January 14th, 2026]
- Pentagon rolls out major reforms of R&D, AI - Breaking Defense - January 14th, 2026 [January 14th, 2026]
- Pentagon task force to deploy AI-powered UAS systems to capture drones - Defense News - January 14th, 2026 [January 14th, 2026]
- Buy These 3 AI ETFs Now: They Could Be Worth $15 Million in 30 Years - The Motley Fool - January 14th, 2026 [January 14th, 2026]
- ServiceNow Patches Critical AI Platform Flaw Allowing Unauthenticated User Impersonation - The Hacker News - January 14th, 2026 [January 14th, 2026]
- Partnering with Sandstone: An AI-Native Platform for In-House Legal Teams - Sequoia Capital - January 14th, 2026 [January 14th, 2026]
- Bandcamps Mission and Our Approach to Generative AI - Bandcamp - January 14th, 2026 [January 14th, 2026]
- Mom of one of Elon Musk's kids says AI chatbot Grok generated sexual deepfake images of her: "Make it stop" - CBS News - January 14th, 2026 [January 14th, 2026]
- Bill Gates Says 'AI Will Change Society the Most'Job Disruption Has Already Begun, 'Less Labor' Will Be Needed, And 5-Day Work Week May Disappear -... - January 14th, 2026 [January 14th, 2026]
- Prediction: This Artificial Intelligence (AI) Chip Stock Will Outperform Nvidia in 2026 (Hint: It's Not AMD) - The Motley Fool - January 14th, 2026 [January 14th, 2026]
- Microsoft responds to AI data center revolt, vowing to cover full power costs and reject local tax breaks - GeekWire - January 14th, 2026 [January 14th, 2026]
- War Department 'SWAT Team' Removes Barriers to Efficient AI Development - U.S. Department of War (.gov) - January 14th, 2026 [January 14th, 2026]
- South Koreas Revised AI Basic Act to Take Effect January 22 With New Oversight, Watermarking Rules - BABL AI - January 14th, 2026 [January 14th, 2026]
- Musks AI tool Grok will be integrated into Pentagon networks, Hegseth says - The Guardian - January 14th, 2026 [January 14th, 2026]
- You cant afford not to use it: Inderpal Bhandari speaks about the future of AI in sports - The Daily Northwestern - January 14th, 2026 [January 14th, 2026]
- How AI image tools can be tricked into making political propaganda - Help Net Security - January 14th, 2026 [January 14th, 2026]
- Mesa County to test AI software for housing development reviews - KKCO 11 News - January 14th, 2026 [January 14th, 2026]
- 'Most Severe AI Vulnerability to Date' Hits ServiceNow - Dark Reading | Security - January 14th, 2026 [January 14th, 2026]
- Self-learning AI generates NFL picks, score predictions for every 2026 divisional round matchup - CBS Sports - January 14th, 2026 [January 14th, 2026]
- Gen AI Is Threatening the Platforms That Dominate Online Travel - Harvard Business Review - January 14th, 2026 [January 14th, 2026]