Missing data

Introduction

Medical AI can sometimes work well even for patients who were barely represented in the data used to train it. But there is no guarantee. In healthcare, missing data is not random. It reflects who had access to doctors, who received diagnoses, which hospitals digitised records early, which populations were heavily studied, and which bodies medicine historically treated as the default. When those gaps enter AI systems, they can become clinical risks rather than mere statistical imperfections.

Overview image for Missing data This matters because the optimistic case for an AI-enabled medical future depends on broad human benefit: earlier diagnosis, longer healthy lives, lower healthcare costs, and eventually more personalised medicine at population scale. If AI systems consistently work best for already well-studied groups, the result could be a world where medical progress accelerates while health inequality deepens. The central question is therefore not simply whether medical AI becomes powerful, but whether it becomes reliable across the full diversity of human patients.

Which patients are often missing from datasets?

Medical datasets rarely represent humanity evenly. They reflect the structure of existing healthcare systems, including who gets seen, tested, scanned, insured, and researched.

[Several groups are repeatedly under-represented.]birmingham.ac.ukUniversity of BirminghamNew Recommendations to Increase Transparency and…Dec 18, 2024 — People who are in minority groups are particul…

People with darker skin tones in dermatology and imaging datasets
Patients from low-income countries or rural health systems

People with rare diseases [healthcare-in-europe.com]healthcare-in-europe.comAI in skin cancer detection: darker skin, inferior results?Research has shown that programs trained on images taken from people with ligh…
Minority ethnic populations in genomic databases
Disabled patients and people with complex co-existing conditions
Older patients excluded from clinical trials
Women in areas historically studied mainly in men
People whose records are fragmented across different healthcare providers
Patients who speak minority languages or use interpreters
Individuals who avoid healthcare because of cost, discrimination, or distrust

The problem begins long before AI training. If a condition is historically underdiagnosed in a population, the dataset may wrongly label many people as healthy. If hospitals serving poorer communities have lower-quality imaging equipment or patchier records, the resulting data may contain systematic blind spots.

Researchers involved in the STANDING Together initiative, which focuses on transparency in medical AI datasets, argue that many systems still fail to report clearly who is represented in their training and evaluation data. Minority groups are especially likely to be under-represented, making uneven performance difficult to detect until systems are deployed in real clinical settings. [University of Birmingham]birmingham.ac.ukUniversity of BirminghamNew Recommendations to Increase Transparency and…Dec 18, 2024 — People who are in minority groups are particul…

This is one reason medical AI fairness is harder than simply “adding more data”. The missing groups are often missing because of deeper structural inequalities in medicine itself.

Missing data illustration 1

Why missing data becomes a clinical risk

AI systems do not merely absorb medical facts. They absorb patterns in the data they are shown. If those patterns are incomplete or distorted, the system may learn misleading shortcuts.

In medicine, those shortcuts can affect diagnosis, triage, and treatment decisions.

Skin tone and dermatology failures

Dermatology has become one of the clearest examples of dataset imbalance. Many image datasets used to train skin cancer and rash detection systems have historically contained disproportionately light skin tones.

Researchers evaluating dermatology AI on more diverse image sets found substantial performance drops when systems encountered darker skin tones or uncommon diseases. One widely discussed study reported major reductions in diagnostic accuracy compared with original benchmark results once models were tested on diverse clinical images rather than narrow curated datasets. [arXiv]arxiv.orgarXiv Disparities in Dermatology AI: Assessments Using Diverse Clinical ImagesarXivDisparities in Dermatology AI: Assessments Using Diverse Clinical ImagesNovember 15, 2021…Published: November 15, 2021

More recent reviews continue to find serious representation problems. Analyses of dermatology datasets report that darker skin tones remain heavily under-represented, even as the total size of image collections grows rapidly. [arXiv]arxiv.orgarXiv Disparities in Dermatology AI: Assessments Using Diverse Clinical ImagesarXivDisparities in Dermatology AI: Assessments Using Diverse Clinical ImagesNovember 15, 2021…Published: November 15, 2021

This matters because skin disease often appears differently across skin tones. A system trained mainly on pale skin may miss inflammation, infection, or malignancy in darker skin. The result is not abstract “bias” but delayed diagnosis.

The issue extends beyond AI. Medical education itself has historically under-represented darker skin in textbooks and training materials, meaning human clinicians and AI systems can inherit related blind spots simultaneously. [Verywell Health]verywellhealth.comVerywell Health Dark Skin Is Underrepresented In MedicineHere's How a Student Is Changing ThatJuly 29, 2020 — Malone Mukwende, a second-year medical student at St. George's University in London…Published: July 29, 2020

Pulse oximeters show how hidden bias enters AI systems

Pulse oximeters became a major warning sign during the COVID-19 pandemic. Researchers found that these devices were more likely to miss dangerously low blood oxygen levels in Black patients.

A highly cited study in The New England Journal of Medicine found substantially higher rates of occult hypoxaemia — hidden low oxygen saturation despite apparently normal readings — in Black patients compared with White patients. [New England Journal of Medicine]nejm.orgthat publishes new medical research and review articles, and editorial…

The issue likely emerged partly because devices had been calibrated disproportionately on lighter skin tones. This illustrates an important point about medical AI: the bias may not begin inside the algorithm itself. It can enter through the measurement tools, datasets, labels, or clinical workflows feeding the system.

An AI model trained on flawed pulse oximeter readings may therefore inherit and amplify earlier errors.

Genomics and the European ancestry problem

Some of the most ambitious visions of AI-enabled medicine involve genomics: predicting disease risk, tailoring treatments, and eventually enabling highly personalised preventive medicine.

But genomic databases remain heavily skewed toward people of European ancestry.

Large reviews of polygenic risk scores — systems that estimate disease risk using many genetic variants — have repeatedly shown that predictive performance often falls sharply outside well-represented European populations. One influential review found that the overwhelming majority of studies focused on European ancestry groups, with very limited representation from African, Hispanic, or Indigenous populations. [Nature]nature.comNatureAnalysis of polygenic risk score usage and performance in…by L Duncan · 2019 · Cited by 1232 — We analyze the first decade of po…

More recent analyses continue to warn that over 70% of data underlying many polygenic risk systems still comes primarily from European ancestry populations. [ScienceDirect]sciencedirect.comScienceDirectMulti-ancestry polygenic risk scores for the prediction of…by A Huerta-Chagoya · 2026 · Cited by 6 — However, more than 7…

This creates a paradox inside the optimistic AI medicine narrative. The more healthcare becomes personalised through genetics, the more damaging unequal representation can become. A future of AI-driven precision medicine could therefore become highly accurate for some populations while remaining unreliable for others.

Missing data illustration 2

Can medical AI still generalise beyond its training data?

The answer is sometimes yes, but only under certain conditions.

Modern AI systems can learn patterns that transfer beyond the exact data they saw during training. In some cases, systems trained in one hospital or population still perform reasonably well elsewhere. Large foundation models trained on diverse multimodal medical data may become more adaptable than earlier narrow systems.

But medicine is unusually vulnerable to failures of generalisation because human populations are biologically, socially, and environmentally heterogeneous.

A system may appear highly accurate in internal testing while failing in real-world deployment because:

disease prevalence differs between regions
imaging hardware changes
healthcare workflows differ
populations have different genetic backgrounds
symptoms present differently across groups
training labels were inconsistent or biased
the system never encountered enough examples of rare conditions

This is why external validation matters so much. A model tested only inside the institution that built it can produce misleadingly optimistic results.

Some researchers argue that genuinely robust medical AI may require something closer to continuous global evaluation rather than one-off approval. Instead of assuming a model works universally, systems may need ongoing subgroup testing across ethnicity, age, sex, disability, geography, and disease severity.

That approach is slower and more expensive than simply scaling models quickly. But it may be essential if medical AI is to support broad human flourishing rather than selective optimisation.

What transparent datasets and subgroup testing can reveal

One of the strongest lessons from recent medical AI debates is that transparency changes what can be detected.

When developers publish detailed information about who appears in datasets, researchers can identify blind spots earlier. Without that transparency, uneven performance may remain hidden behind headline accuracy numbers.

Several practices are becoming increasingly important.

Reporting who is in the data

Researchers now increasingly argue that medical AI papers should disclose:

ethnic composition
sex and age distributions
geographic origin
imaging hardware differences
socioeconomic representation
missing or excluded populations
performance differences across subgroups

The STANDING Together recommendations were developed partly because many datasets historically failed to report this information consistently. [University of Birmingham]birmingham.ac.ukUniversity of BirminghamNew Recommendations to Increase Transparency and…Dec 18, 2024 — People who are in minority groups are particul…

A model claiming 95% accuracy means little if readers cannot tell which patients were tested.

Missing data illustration 3

Testing subgroup performance explicitly

Overall averages can hide dangerous failures.

If an AI system works extremely well for most patients but poorly for a smaller under-represented group, the average may still appear impressive. Subgroup testing attempts to expose those hidden disparities directly.

This is becoming increasingly important in dermatology, radiology, genomics, and clinical risk prediction.

Some newer research also explores techniques such as transfer learning and fairness-oriented model training to improve performance in under-represented groups. [Nature]nature.comClinical implementation of polygenic risk scoresby E Roberts · 2025 · Cited by 3 — Diverse representation in GWAS is vital to identify di…

But technical fixes alone are unlikely to solve the underlying issue if the healthcare system continues generating unequal data in the first place.

Monitoring systems after deployment

A model approved in one context may drift over time.

Hospitals change equipment. Disease patterns evolve. Populations shift. New medications alter outcomes. AI systems deployed across countries may encounter entirely different clinical realities.

For that reason, some experts increasingly argue that medical AI should be treated less like a static product and more like ongoing infrastructure requiring surveillance and recalibration.

The core challenge is not merely building a model once. It is maintaining reliability across changing populations over years or decades.

Why this matters for the larger AI bloom vision

The long-term optimistic vision around AI often imagines medicine becoming radically more capable: accelerated drug discovery, earlier diagnosis, personalised prevention, and eventually much longer healthy lives.

But these futures depend on trust and broad applicability.

If major populations repeatedly experience worse outcomes from AI-assisted healthcare, public confidence may weaken. Regulators may become more restrictive. Health systems may hesitate to deploy beneficial technologies. Most importantly, the gains from medical acceleration may concentrate among already advantaged populations.

The issue is therefore not separate from the broader “AI bloom” debate. It sits near the centre of it.

A flourishing future would require medical intelligence that works across humanity’s diversity rather than only across its best-documented subsets. That means investment not only in larger models, but in broader participation in medicine itself: more representative research, better global data infrastructure, stronger public health systems, multilingual healthcare access, and more equitable clinical testing.

In that sense, dataset inclusion is not just a technical detail. It is part of deciding who the future of medicine is being built for.

Endnotes

Source: arxiv.org
Title: arXiv Disparities in Dermatology AI: Assessments Using Diverse Clinical Images
Link: https://arxiv.org/abs/2111.08006
Source snippet
arXivDisparities in Dermatology AI: Assessments Using Diverse Clinical ImagesNovember 15, 2021...

Published: November 15, 2021
Source: arxiv.org
Link: https://arxiv.org/abs/2203.08807
Source: arxiv.org
Title: arXiv A Global Atlas of Digital Dermatology to Map Innovation and Disparities
Link: https://arxiv.org/abs/2601.00840
Source snippet
arXivA Global Atlas of Digital Dermatology to Map Innovation and DisparitiesDecember 27, 2025...

Published: December 27, 2025
Source: arxiv.org
Link: https://arxiv.org/pdf/2602.14356
Source snippet
A Generative AI Approach for Reducing Skin Tone Bias in...by AM Shabu · 2026 — Current research reveals four critical gaps in dermatolog...
Source: nature.com
Link: https://www.nature.com/articles/s41467-019-11112-0
Source snippet
NatureAnalysis of polygenic risk score usage and performance in...by L Duncan · 2019 · Cited by 1232 — We analyze the first decade of po...
Source: sciencedirect.com
Link: https://www.sciencedirect.com/science/article/pii/S221385872500405X
Source snippet
ScienceDirectMulti-ancestry polygenic risk scores for the prediction of...by A Huerta-Chagoya · 2026 · Cited by 6 — However, more than 7...
Source: nature.com
Link: https://www.nature.com/articles/s41431-025-01931-9
Source snippet
Clinical implementation of polygenic risk scoresby E Roberts · 2025 · Cited by 3 — Diverse representation in GWAS is vital to identify di...
Source: nature.com
Link: https://www.nature.com/articles/s41467-026-68696-7
Source snippet
However, most existing PGSs were derived...
Source: arxiv.org
Title: arXiv Lesion TABE: Equitable AI for Skin Lesion Detection
Link: https://arxiv.org/abs/2601.03090
Source: nature.com
Link: https://www.nature.com/articles/s41598-025-18852-8
Source snippet
The overrepresentation of certain...Read more...
Source: nature.com
Link: https://www.nature.com/articles/s41598-025-02903-1
Source snippet
(2024), increases the representation of non-European ancestries for PRS training, testing and...Read more...
Source: nature.com
Link: https://www.nature.com/articles/s42003-023-05352-6
Source snippet
Improving genetic risk prediction across diverse population...by PK Gyawali · 2023 · Cited by 25 — We propose a deep-learning framework...
Source: nature.com
Link: https://www.nature.com/articles/s41431-023-01517-3
Source snippet
Validity of European-centric cardiometabolic polygenic...by CC Topriceanu · 2024 · Cited by 12 — PGSs derived mostly from European popul...
Source: nature.com
Title: Lancet Digit. Health 7
Link: https://www.nature.com/articles/s41746-025-01667-2
Source snippet
A scoping review and evidence gap analysis of clinical AI...by M Liu · 2025 · Cited by 37 — Tackling algorithmic bias and promoting tran...
Source: sciencedirect.com
Title: Since its first weekly issue (
Link: https://www.sciencedirect.com/journal/the-lancet
Source snippet
The Lancet | Journal | ScienceDirect.com by ElsevierThe Lancet is an independent, international general medical journal founded in 1823 b...
Source: sciencedirect.com
Link: https://www.sciencedirect.com/science/article/pii/S2589-7500%2823%2900130-9
Source snippet
gnosis of suspicious pigmented skin cancer in patients presenting to a...Read more...
Source: sciencedirect.com
Link: https://www.sciencedirect.com/science/article/pii/S0022202X23029640
Source snippet
Artificial Intelligence in Skin Cancer Diagnosis: A Reality...by G Brancaccio · 2024 · Cited by 149 — They found that AI was superior to...
Source: sciencedirect.com
Link: https://www.sciencedirect.com/science/article/pii/S2589004225026367
Source snippet
Polygenic risk scores: Navigating the future of precision...by NHK Han · 2025 · Cited by 1 — Although individuals of European descent re...
Source: birmingham.ac.uk
Link: https://www.birmingham.ac.uk/news/2024/new-recommendations-to-increase-transparency-and-tackle-potential-bias-in-medical-ai-technologies
Source snippet
University of BirminghamNew Recommendations to Increase Transparency and...Dec 18, 2024 — People who are in minority groups are particul...
Source: verywellhealth.com
Title: Verywell Health Dark Skin Is Underrepresented In Medicine
Link: https://www.verywellhealth.com/skin-of-color-medical-textbooks-5072690
Source snippet
Here's How a Student Is Changing ThatJuly 29, 2020 — Malone Mukwende, a second-year medical student at St. George's University in London...

Published: July 29, 2020
Source: nejm.org
Link: https://www.nejm.org/doi/full/10.1056/NEJMc2029240
Source snippet
New England Journal of MedicineRacial Bias in Pulse Oximetry Measurementby MW Sjoding · 2020 · Cited by 1217 — An arterial oxygen saturat...
Source: nejm.org
Link: https://www.nejm.org/
Source snippet
that publishes new medical research and review articles, and editorial...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/400855608_A_Generative_AI_Approach_for_Reducing_Skin_Tone_Bias_in_Skin_Cancer_Classification
Source snippet
A Generative AI Approach for Reducing Skin Tone Bias in...16 Feb 2026 — However, current AI diagnostic tools are often trained on datase...
Source: healthcare-in-europe.com
Link: https://healthcare-in-europe.com/en/news/ai-in-skin-cancer-detection-darker-skin-inferior-results.html
Source snippet
AI in skin cancer detection: darker skin, inferior results?Research has shown that programs trained on images taken from people with ligh...
Source: hdruk.ac.uk
Link: https://www.hdruk.ac.uk/projects/uk-canadian-ai-initiative-to-create-equitable-multi-ethnic-polygenic-risk-scores-that-improve-clinical-care/
Source snippet
UK-Canadian AI initiative to create equitable multi-ethnic...This bias means that polygenic risk scores – which aim to quantify the cumu...
Source: epicresearch.org
Link: https://epicresearch.org/articles/black-patients-32-more-likely-than-white-patients-to-experience-occult-hypoxemia-which-may-result-in-delayed-care
Source snippet
Black Patients 32% More Likely Than White...20 Mar 2024 — Non-Hispanic Black patients are 32% more likely to have occult hypoxemia, whic...
Source: probiologists.com
Link: https://www.probiologists.com/article/racial-underrepresentation-in-dermatological-datasets-leads-to-biased-machine-learning-models-and-inequitable-healthcare
Source snippet
Racial underrepresentation in dermatological datasets...by G Kleinberg · 2022 · Cited by 62 — This review explores the extent, causes, p...
Source: clinician.nejm.org
Title: pulse oximetry less accurate patients darker skin pigmentation nejm jw.NA55581
Link: https://clinician.nejm.org/pulse-oximetry-less-accurate-patients-darker-skin-pigmentation-nejm-jw.NA55581
Source snippet
Oximetry Is Less Accurate in Patients with Darker Skin...After adjusting for potential confounders, self-identified Black patients were...
Source: news.vumc.org
Title: skin tone may affect accuracy of blood oxygen measurement in children study
Link: https://news.vumc.org/2025/03/04/skin-tone-may-affect-accuracy-of-blood-oxygen-measurement-in-children-study/
Source snippet
tone may affect accuracy of blood oxygen measurement...4 Mar 2025 — Pulse oximetry in pediatric patients with darker skin tones may over...
Source: clinician.nejm.org
Title: pulse oximeters less accurate consistent black inpatients nejm jw.NA55129
Link: https://clinician.nejm.org/pulse-oximeters-less-accurate-consistent-black-inpatients-nejm-jw.NA55129
Source snippet
Oximeters Are Less Accurate and Consistent for Black...In critical care units, pulse oximeters are less likely to detect hypoxemia in Bl...
Source: youtube.com
Link: https://www.youtube.com/watch?v=gLdJxRSMXeA
Source snippet
AI Bias in Medical Images: Ensuring Skin Tone Diversity in...There is a notable bias in AI generated medical images predominantly underr...
Source: theguardian.com
Title: ai skin cancer diagnoses risk being less accurate for dark skin study
Link: https://www.theguardian.com/society/2021/nov/09/ai-skin-cancer-diagnoses-risk-being-less-accurate-for-dark-skin-study
Source snippet
AI skin cancer diagnoses risk being less accurate for dark...Nov 10, 2021 — AI systems being developed to diagnose skin cancer run the r...