HEAL: A framework for health equity assessment of machine learning performance – Google Research
Posted by Mike Schaekermann, Research Scientist, Google Research, and Ivor Horn, Chief Health Equity Officer & Director, Google Core
Health equity is a major societal concern worldwide with disparities having many causes. These sources include limitations in access to healthcare, differences in clinical treatment, and even fundamental differences in the diagnostic technology. In dermatology for example, skin cancer outcomes are worse for populations such as minorities, those with lower socioeconomic status, or individuals with limited healthcare access. While there is great promise in recent advances in machine learning (ML) and artificial intelligence (AI) to help improve healthcare, this transition from research to bedside must be accompanied by a careful understanding of whether and how they impact health equity.
Health equity is defined by public health organizations as fairness of opportunity for everyone to be as healthy as possible. Importantly, equity may be different from equality. For example, people with greater barriers to improving their health may require more or different effort to experience this fair opportunity. Similarly, equity is not fairness as defined in the AI for healthcare literature. Whereas AI fairness often strives for equal performance of the AI technology across different patient populations, this does not center the goal of prioritizing performance with respect to pre-existing health disparities.
In Health Equity Assessment of machine Learning performance (HEAL): a framework and dermatology AI model case study, published in The Lancet eClinicalMedicine, we propose a methodology to quantitatively assess whether ML-based health technologies perform equitably. In other words, does the ML model perform well for those with the worst health outcomes for the condition(s) the model is meant to address? This goal anchors on the principle that health equity should prioritize and measure model performance with respect to disparate health outcomes, which may be due to a number of factors that include structural inequities (e.g., demographic, social, cultural, political, economic, environmental and geographic).
The HEAL framework proposes a 4-step process to estimate the likelihood that an ML-based health technology performs equitably:
The final steps output is termed the HEAL metric, which quantifies how anticorrelated the ML models performance is with health disparities. In other words, does the model perform better with populations that have the worse health outcomes?
This 4-step process is designed to inform improvements for making ML model performance more equitable, and is meant to be iterative and re-evaluated on a regular basis. For example, the availability of health outcomes data in step (2) can inform the choice of demographic factors and brackets in step (1), and the framework can be applied again with new datasets, models and populations.
With this work, we take a step towards encouraging explicit assessment of the health equity considerations of AI technologies, and encourage prioritization of efforts during model development to reduce health inequities for subpopulations exposed to structural inequities that can precipitate disparate outcomes. We should note that the present framework does not model causal relationships and, therefore, cannot quantify the actual impact a new technology will have on reducing health outcome disparities. However, the HEAL metric may help identify opportunities for improvement, where the current performance is not prioritized with respect to pre-existing health disparities.
As an illustrative case study, we applied the framework to a dermatology model, which utilizes a convolutional neural network similar to that described in prior work. This example dermatology model was trained to classify 288 skin conditions using a development dataset of 29k cases. The input to the model consists of three photos of a skin concern along with demographic information and a brief structured medical history. The output consists of a ranked list of possible matching skin conditions.
Using the HEAL framework, we evaluated this model by assessing whether it prioritized performance with respect to pre-existing health outcomes. The model was designed to predict possible dermatologic conditions (from a list of hundreds) based on photos of a skin concern and patient metadata. Evaluation of the model is done using a top-3 agreement metric, which quantifies how often the top 3 output conditions match the most likely condition as suggested by a dermatologist panel. The HEAL metric is computed via the anticorrelation of this top-3 agreement with health outcome rankings.
We used a dataset of 5,420 teledermatology cases, enriched for diversity in age, sex and race/ethnicity, to retrospectively evaluate the models HEAL metric. The dataset consisted of store-and-forward cases from patients of 20 years or older from primary care providers in the USA and skin cancer clinics in Australia. Based on a review of the literature, we decided to explore race/ethnicity, sex and age as potential factors of inequity, and used sampling techniques to ensure that our evaluation dataset had sufficient representation of all race/ethnicity, sex and age groups. To quantify pre-existing health outcomes for each subgroup we relied on measurements from public databases endorsed by the World Health Organization, such as Years of Life Lost (YLLs) and Disability-Adjusted Life Years (DALYs; years of life lost plus years lived with disability).
However, while the model was likely to perform equitably across age groups for cancer conditions specifically, we discovered that it had room for improvement across age groups for non-cancer conditions. For example, those 70+ have the poorest health outcomes related to non-cancer skin conditions, yet the model didn't prioritize performance for this subgroup.
For holistic evaluation, the HEAL metric cannot be employed in isolation. Instead this metric should be contextualized alongside many other factors ranging from computational efficiency and data privacy to ethical values, and aspects that may influence the results (e.g., selection bias or differences in representativeness of the evaluation data across demographic groups).
As an adversarial example, the HEAL metric can be artificially improved by deliberately reducing model performance for the most advantaged subpopulation until performance for that subpopulation is worse than all others. For illustrative purposes, given subpopulations A and B where A has worse health outcomes than B, consider the choice between two models: Model 1 (M1) performs 5% better for subpopulation A than for subpopulation B. Model 2 (M2) performs 5% worse on subpopulation A than B. The HEAL metric would be higher for M1 because it prioritizes performance on a subpopulation with worse outcomes. However, M1 may have absolute performances of just 75% and 70% for subpopulations A and B respectively, while M2 has absolute performances of 75% and 80% for subpopulations A and B respectively. Choosing M1 over M2 would lead to worse overall performance for all subpopulations because some subpopulations are worse-off while no subpopulation is better-off.
Accordingly, the HEAL metric should be used alongside a Pareto condition (discussed further in the paper), which restricts model changes so that outcomes for each subpopulation are either unchanged or improved compared to the status quo, and performance does not worsen for any subpopulation.
The HEAL framework, in its current form, assesses the likelihood that an ML-based model prioritizes performance for subpopulations with respect to pre-existing health disparities for specific subpopulations. This differs from the goal of understanding whether ML will reduce disparities in outcomes across subpopulations in reality. Specifically, modeling improvements in outcomes requires a causal understanding of steps in the care journey that happen both before and after use of any given model. Future research is needed to address this gap.
The HEAL framework enables a quantitative assessment of the likelihood that health AI technologies prioritize performance with respect to health disparities. The case study demonstrates how to apply the framework in the dermatological domain, indicating a high likelihood that model performance is prioritized with respect to health disparities across sex and race/ethnicity, but also revealing the potential for improvements for non-cancer conditions across age. The case study also illustrates limitations in the ability to apply all recommended aspects of the framework (e.g., mapping societal context, availability of data), thus highlighting the complexity of health equity considerations of ML-based tools.
This work is a proposed approach to address a grand challenge for AI and health equity, and may provide a useful evaluation framework not only during model development, but during pre-implementation and real-world monitoring stages, e.g., in the form of health equity dashboards. We hold that the strength of the HEAL framework is in its future application to various AI tools and use cases and its refinement in the process. Finally, we acknowledge that a successful approach towards understanding the impact of AI technologies on health equity needs to be more than a set of metrics. It will require a set of goals agreed upon by a community that represents those who will be most impacted by a model.
The research described here is joint work across many teams at Google. We are grateful to all our co-authors: Terry Spitz, Malcolm Pyles, Heather Cole-Lewis, Ellery Wulczyn, Stephen R. Pfohl, Donald Martin, Jr., Ronnachai Jaroensri, Geoff Keeling, Yuan Liu, Stephanie Farquhar, Qinghan Xue, Jenna Lester, Can Hughes, Patricia Strachan, Fraser Tan, Peggy Bui, Craig H. Mermel, Lily H. Peng, Yossi Matias, Greg S. Corrado, Dale R. Webster, Sunny Virmani, Christopher Semturs, Yun Liu, and Po-Hsuan Cameron Chen. We also thank Lauren Winer, Sami Lachgar, Ting-An Lin, Aaron Loh, Morgan Du, Jenny Rizk, Renee Wong, Ashley Carrick, Preeti Singh, Annisah Um'rani, Jessica Schrouff, Alexander Brown, and Anna Iurchenko for their support of this project.
Go here to see the original:
HEAL: A framework for health equity assessment of machine learning performance - Google Research
- Machine-Learning App Helps Anesthesiologists Navigate Critical Surgical Equipment in Real Time - Carle Illinois College of Medicine - February 24th, 2026 [February 24th, 2026]
- Fractal Launches PiEvolve, an Evolutionary Agentic Engine for Autonomous Machine Learning and Scientific Discovery - Yahoo Finance - February 24th, 2026 [February 24th, 2026]
- How Brain Data and Machine Learning Could Transform the Aging Industry - gritdaily.com - February 24th, 2026 [February 24th, 2026]
- AI and machine learning trends for Arizona leaders to watch in healthcare delivery and traveler services - AZ Big Media - February 24th, 2026 [February 24th, 2026]
- AI and machine learning are the future of Wi-Fi management: WBA report - Telecompetitor - February 22nd, 2026 [February 22nd, 2026]
- Machine learning streamlines the complexities of making better proteins - Science News - February 20th, 2026 [February 20th, 2026]
- WBA Publishes Guidance on Artificial Intelligence and Machine Learning for Intelligent Wi-Fi - ARC Advisory Group - February 20th, 2026 [February 20th, 2026]
- Machine learning-predicted insulin resistance is a risk factor for 12 types of cancer - Nature - February 20th, 2026 [February 20th, 2026]
- Exploring Machine Learning at the DOF - University of the Philippines Diliman - February 20th, 2026 [February 20th, 2026]
- AI and Machine Learning - Where US agencies are finding measurable value from AI - Smart Cities World - February 20th, 2026 [February 20th, 2026]
- Modeling visual perception of Chinese classical private gardens with image parsing and interpretable machine learning - Nature - February 16th, 2026 [February 16th, 2026]
- Analysis of Market Segments and Major Growth Areas in the Machine Learning (ML) Feature Lineage Tools Market - openPR.com - February 16th, 2026 [February 16th, 2026]
- Apple Makes One Of Its Largest Ever Acquisitions, Buys The Israeli Machine Learning Firm, Q.ai - Wccftech - February 1st, 2026 [February 1st, 2026]
- Keysights Machine Learning Toolkit to Speed Device Modeling and PDK Dev - All About Circuits - February 1st, 2026 [February 1st, 2026]
- University of Missouri Study: AI/Machine Learning Improves Cardiac Risk Prediction Accuracy - Quantum Zeitgeist - February 1st, 2026 [February 1st, 2026]
- How AI and Machine Learning Are Transforming Mobile Banking Apps - vocal.media - February 1st, 2026 [February 1st, 2026]
- Machine Learning in Production? What This Really Means - Towards Data Science - January 28th, 2026 [January 28th, 2026]
- Best Machine Learning Stocks of 2026 and How to Invest in Them - The Motley Fool - January 28th, 2026 [January 28th, 2026]
- Machine learning-based prediction of mortality risk from air pollution-induced acute coronary syndrome in the Western Pacific region - Nature - January 28th, 2026 [January 28th, 2026]
- Machine Learning Predicts the Strength of Carbonated Recycled Concrete - AZoBuild - January 28th, 2026 [January 28th, 2026]
- Vertiv Next Predict is a new AI-powered, managed service that combines field expertise and advanced machine learning algorithms to anticipate issues... - January 28th, 2026 [January 28th, 2026]
- Machine Learning in Network Security: The 2026 Firewall Shift - openPR.com - January 28th, 2026 [January 28th, 2026]
- Why IBMs New Machine-Learning Model Is a Big Deal for Next-Generation Chips - TipRanks - January 24th, 2026 [January 24th, 2026]
- A no-compromise amplifier solution: Synergy teams up with Wampler and Friedman to launch its machine-learning power amp and promises to change the... - January 24th, 2026 [January 24th, 2026]
- Our amplifier learns your cabinets impedance through controlled sweeps and continues to monitor it in real-time: Synergys Power Amp Machine-Learning... - January 24th, 2026 [January 24th, 2026]
- Machine Learning Studied to Predict Response to Advanced Overactive Bladder Therapies - Sandip Vasavada - UroToday - January 24th, 2026 [January 24th, 2026]
- Blending Education, Machine Learning to Detect IV Fluid Contaminated CBCs, With Carly Maucione, MD - HCPLive - January 24th, 2026 [January 24th, 2026]
- Why its critical to move beyond overly aggregated machine-learning metrics - MIT News - January 24th, 2026 [January 24th, 2026]
- Machine Learning Lends a Helping Hand to Prosthetics - AIP Publishing LLC - January 24th, 2026 [January 24th, 2026]
- Hassan Taher Explains the Fundamentals of Machine Learning and Its Relationship to AI - mitechnews.com - January 24th, 2026 [January 24th, 2026]
- Keysight targets faster PDK development with machine learning toolkit - eeNews Europe - January 24th, 2026 [January 24th, 2026]
- Training and external validation of machine learning supervised prognostic models of upper tract urothelial cancer (UTUC) after nephroureterectomy -... - January 24th, 2026 [January 24th, 2026]
- Age matters: a narrative review and machine learning analysis on shared and separate multidimensional risk domains for early and late onset suicidal... - January 24th, 2026 [January 24th, 2026]
- Uncovering Hidden IV Fluid Contamination Through Machine Learning, With Carly Maucione, MD - HCPLive - January 24th, 2026 [January 24th, 2026]
- Machine learning identifies factors that may determine the age of onset of Huntington's disease - Medical Xpress - January 24th, 2026 [January 24th, 2026]
- AI and Machine Learning - WEF expands Fourth Industrial Revolution Network - Smart Cities World - January 24th, 2026 [January 24th, 2026]
- Machine-learning analysis reclassifies armed conflicts into three new archetypes - The Brighter Side of News - January 24th, 2026 [January 24th, 2026]
- Machine learning and AI the future of drought monitoring in Canada - sasktoday.ca - January 24th, 2026 [January 24th, 2026]
- Machine learning revolutionises the development of nanocomposite membranes for CO capture - European Coatings - January 24th, 2026 [January 24th, 2026]
- AI and Machine Learning - Leading data infrastructure is helping power better lives in Sunderland - Smart Cities World - January 24th, 2026 [January 24th, 2026]
- How banks are responsibly embedding machine learning and GenAI into AML surveillance - Compliance Week - January 20th, 2026 [January 20th, 2026]
- Enhancing Teaching and Learning of Vocational Skills through Machine Learning and Cognitive Training (MCT) - Amrita Vishwa Vidyapeetham - January 20th, 2026 [January 20th, 2026]
- New Research in Annals of Oncology Shows Machine Learning Revelation of Global Cancer Trend Drivers - Oncodaily - January 20th, 2026 [January 20th, 2026]
- Machine learning-assisted mapping of VT ablation targets: progress and potential - Hospital Healthcare Europe - January 20th, 2026 [January 20th, 2026]
- Machine Learning Achieves Runtime Optimisation for GEMM with Dynamic Thread Selection - Quantum Zeitgeist - January 20th, 2026 [January 20th, 2026]
- Machine learning algorithm predicts Bitcoin price on January 31, 2026 - Finbold - January 20th, 2026 [January 20th, 2026]
- AI and Machine Learning Transform Baldness Detection and Management - Bioengineer.org - January 20th, 2026 [January 20th, 2026]
- A longitudinal machine-learning approach to predicting nursing home closures in the U.S. - Nature - January 11th, 2026 [January 11th, 2026]
- Occams Razor in Machine Learning. The Power of Simplicity in a Complex World - DataDrivenInvestor - January 11th, 2026 [January 11th, 2026]
- Study Explores Use of Automated Machine Learning to Compare Frailty Indices in Predicting Spinal Surgery Outcomes - geneonline.com - January 11th, 2026 [January 11th, 2026]
- Hunting for "Oddballs" With Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit... - January 9th, 2026 [January 9th, 2026]
- A Machine Learning-Driven Electrophysiological Platform for Real-Time Tumor-Neural Interaction Analysis and Modulation - Nature - January 9th, 2026 [January 9th, 2026]
- Machine learning elucidates associations between oral microbiota and the decline of sweet taste perception during aging - Nature - January 9th, 2026 [January 9th, 2026]
- Prognostic model for pancreatic cancer based on machine learning of routine slides and transcriptomic tumor analysis - Nature - January 9th, 2026 [January 9th, 2026]
- Bidgely Redefines Energy AI in 2025: From Machine Learning to Agentic AI - galvnews.com - January 9th, 2026 [January 9th, 2026]
- Machine Learning in Pharmaceutical Industry Market Size Reach USD 26.2 Billion by 2031 - openPR.com - January 9th, 2026 [January 9th, 2026]
- Noise-resistant Qubit Control With Machine Learning Delivers Over 90% Fidelity - Quantum Zeitgeist - January 9th, 2026 [January 9th, 2026]
- Machine Learning Models Forecast Parshwanath Corporation Limited Uptick - Real-Time Stock Alerts & High Return Trading Ideas -... - January 9th, 2026 [January 9th, 2026]
- Machine Learning Models Forecast Imagicaaworld Entertainment Limited Uptick - Technical Resistance Breaks & Outstanding Capital Returns -... - January 2nd, 2026 [January 2nd, 2026]
- Cognitive visual strategies are associated with delivery accuracy in elite wheelchair curling: insights from eye-tracking and machine learning -... - January 2nd, 2026 [January 2nd, 2026]
- Machine Learning Models Forecast Covidh Technologies Limited Uptick - Earnings Forecast Updates & Small Investment Trading Plans -... - January 2nd, 2026 [January 2nd, 2026]
- Machine Learning Models Forecast Sri Adhikari Brothers Television Network Limited Uptick - Stock Split Announcements & Rapid Wealth Accumulation -... - January 2nd, 2026 [January 2nd, 2026]
- Army to ring in new year with new AI and machine learning career path for officers - Stars and Stripes - December 31st, 2025 [December 31st, 2025]
- Army launches AI and machine-learning career path for officers - Federal News Network - December 31st, 2025 [December 31st, 2025]
- AI and Machine Learning Transforming Business Operations, Strategy, and Growth AI - openPR.com - December 31st, 2025 [December 31st, 2025]
- New at Mouser: Infineon Technologies PSOC Edge Machine Learning MCUs for Robotics, Industrial, and Smart Home Applications - Business Wire - December 31st, 2025 [December 31st, 2025]
- Machine Learning Models Forecast The Federal Bank Limited Uptick - Double Top/Bottom Patterns & Affordable Growth Trading - bollywoodhelpline.com - December 31st, 2025 [December 31st, 2025]
- Machine Learning Models Forecast Future Consumer Limited Uptick - Stock Valuation Metrics & Free Stock Market Beginner Guides - earlytimes.in - December 31st, 2025 [December 31st, 2025]
- Machine learning identifies statin and phenothiazine combo for neuroblastoma treatment - Medical Xpress - December 29th, 2025 [December 29th, 2025]
- Machine Learning Framework Developed to Align Educational Curricula with Workforce Needs - geneonline.com - December 29th, 2025 [December 29th, 2025]
- Study Develops Multimodal Machine Learning System to Evaluate Physical Education Effectiveness - geneonline.com - December 29th, 2025 [December 29th, 2025]
- AI Indicators Detect Buy Opportunity in Everest Organics Limited - Healthcare Stock Analysis & Smarter Trades Backed by Machine Learning -... - December 29th, 2025 [December 29th, 2025]
- Automated Fractal Analysis of Right and Left Condyles on Digital Panoramic Images Among Patients With Temporomandibular Disorder (TMD) and Use of... - December 29th, 2025 [December 29th, 2025]
- Machine Learning Models Forecast Gayatri Highways Limited Uptick - Inflation Impact on Stocks & Fast Profit Trading Ideas - bollywoodhelpline.com - December 29th, 2025 [December 29th, 2025]
- Machine Learning Models Forecast Punjab Chemicals and Crop Protection Limited Uptick - Blue Chip Stock Analysis & Double Or Triple Investment -... - December 29th, 2025 [December 29th, 2025]
- Machine Learning Models Forecast Walchand PeopleFirst Limited Uptick - Risk Adjusted Returns & Investment Recommendations You Can Trust -... - December 27th, 2025 [December 27th, 2025]
- Machine learning helps robots see clearly in total darkness using infrared - Tech Xplore - December 27th, 2025 [December 27th, 2025]
- Momentum Traders Eye Manas Properties Limited for Quick Bounce - Market Sentiment Report & Smarter Trades Backed by Machine Learning -... - December 27th, 2025 [December 27th, 2025]
- Machine Learning Models Forecast Bigbloc Construction Limited Uptick - MACD Trading Signals & Minimal Risk High Reward - bollywoodhelpline.com - December 27th, 2025 [December 27th, 2025]
- Avoid These 10 Machine Learning Project Mistakes - Analytics Insight - December 27th, 2025 [December 27th, 2025]