Live monitoring

Introduction

A medical AI system can perform impressively in a controlled study and still widen health gaps once it reaches real hospitals. Patients change, diseases evolve, scanners are upgraded, clinical workflows shift, and the people using the system may not resemble the population it was originally tested on. That means the central fairness problem in medical AI is not only how models are trained, but how they are monitored after launch.

Live monitoring illustration 1 This matters because unequal failure is often hard to see. A triage system may still appear accurate overall while quietly becoming worse for poorer neighbourhoods, minority ethnic groups, older patients, or hospitals with older equipment. If health systems rely on AI at scale without watching for these shifts, automation can reinforce existing inequalities while appearing scientifically neutral. Real-world monitoring is therefore not a bureaucratic extra. It is one of the main mechanisms that determines whether medical AI becomes a broad public-health gain or another layer of uneven healthcare infrastructure. [PMC]pmc.ncbi.nlm.nih.govPMCKeeping Medical AI Healthy and Trustworthy: A ReviewPMCby H Guan · 2025 · Cited by 25 — This work aims to guide the development of reliable, robust medical AI systems capable of sustaining… [The Lancet]facebook.comexperts have developed recommendations to increase transparency and tackle poteThe Lancet8 Jan 2025 — The ethics of artificial intelligence in healthcare involves navigating issues like patient privacy, algorithm bia…

Why benchmark scores are not enough

Most medical AI systems are approved or adopted based on benchmark testing: how well they classify scans, predict risk, or assist diagnosis on a fixed dataset. But healthcare is not fixed.

A chest X-ray model trained during one phase of respiratory disease may encounter very different patient populations a few years later. A sepsis prediction system trained in a large urban hospital may behave differently in a rural clinic. A model tested mostly on newer imaging devices may struggle when deployed in underfunded hospitals using older hardware.

Researchers increasingly describe this as “drift”: the gradual movement between the conditions a model learned from and the conditions it actually encounters in practice. [PMC]pmc.ncbi.nlm.nih.govPMCKeeping Medical AI Healthy and Trustworthy: A ReviewPMCby H Guan · 2025 · Cited by 25 — This work aims to guide the development of reliable, robust medical AI systems capable of sustaining… [JAMA Network]jamanetwork.comJAMA NetworkDetecting and Remediating Harmful Data Shifts for…by V Subasri · 2025 · Cited by 45 — A proactive, label-agnostic monitori…

[Several kinds of drift matter for health equity:]spin.atomicobject.comdrift monitoring health aiatomicobject.comReal‑World Evaluation & Drift Monitoring for Health AI22 Jan 2026 — Healthcare is adopting AI quickly, but expectations c…

Population drift: the patient mix changes over time or differs between hospitals.
Clinical drift: treatment practices, coding systems, or referral patterns evolve.
Data drift: scanners, sensors, or electronic record systems change the structure of the inputs.
Behavioural drift: clinicians alter their own decisions because they are interacting with AI recommendations.

The danger is that average performance can remain stable while subgroup performance deteriorates. A model that falls from 95% accuracy to 92% overall may hide a much larger drop for specific populations.

This is already a recognised regulatory concern. The US Food and Drug Administration has repeatedly warned that AI-enabled medical devices may change or degrade over time due to shifting patient populations and changing data environments. Its recent guidance and consultation documents emphasise ongoing real-world performance monitoring across the full lifecycle of AI systems. [U.S. Food and Drug Administration+3U.S. Food and Drug Administration+3U.S. Food and Drug Administration]

The broader AI bloom argument often assumes that advanced AI could help extend healthy human life on a massive scale. But that optimistic future depends less on one-off benchmark victories than on whether systems remain reliable across decades, institutions, and populations. Monitoring is what turns laboratory performance into durable public infrastructure.

How drift can hide unequal performance

One reason unequal performance is difficult to detect is that healthcare systems often collect outcome data slowly and unevenly.

A screening AI may identify cancer risk today, but confirmation may only arrive months later through biopsy results. By the time hospitals realise accuracy has fallen for one group, thousands of patients may already have been assessed.

This creates a dangerous lag between deployment and accountability.

Researchers studying healthcare AI increasingly argue that hospitals need continuous surveillance systems similar to those used for infectious disease outbreaks or drug safety monitoring. [PMC]pmc.ncbi.nlm.nih.govPMCKeeping Medical AI Healthy and Trustworthy: A ReviewPMCby H Guan · 2025 · Cited by 25 — This work aims to guide the development of reliable, robust medical AI systems capable of sustaining…

The problem with aggregate averages

Suppose an AI system for detecting heart disease works extremely well in affluent hospitals with complete electronic records, but less well in clinics where patient histories are fragmented. The national average may still look excellent.

This masking effect is especially dangerous because disadvantaged groups are often statistically smaller inside healthcare datasets. If hospitals only monitor “overall accuracy”, unequal failures can remain invisible.

The same issue appears in medical imaging. Researchers working on drift detection for radiology AI have shown that changes in scanner metadata, image quality, and clinical context can alter model behaviour without obvious warning signs. [arXiv]arxiv.orgarXiv Monitoring Deployed AI Systems in Health CarearXiv Monitoring Deployed AI Systems in Health Care

In practice, this means a hospital cannot safely assume that because an AI tool once passed validation, it remains fair across all patient groups years later.

Real-world healthcare keeps changing

COVID-19 demonstrated how quickly medical conditions can shift. Hospitals altered triage rules, patient populations changed, and disease prevalence moved rapidly across regions. Models trained before the pandemic often performed unpredictably afterwards.

But slower changes matter too:

ageing populations
changing obesity rates
migration patterns
new treatment protocols
updated coding systems
different referral behaviour
new diagnostic equipment

Each change can subtly alter the meaning of the data entering the model.

Health inequalities make this even more complex. Wealthier hospitals often upgrade equipment first, maintain cleaner data pipelines, and employ more specialised staff. Poorer systems may experience more missing data, inconsistent records, or infrastructure interruptions. If monitoring systems only exist in elite hospitals, then the populations most vulnerable to AI failure may remain the least visible. [The Lancet]thelancet.comThe LancetTackling algorithmic bias and promoting transparency in…by JE Alderman · 2025 · Cited by 137 — Biases in AI health technolog…

What fair post-deployment oversight looks like

Real-world oversight is not a single technology. It is a set of organisational practices designed to detect when AI systems stop serving people equally well.

The strongest proposals increasingly treat medical AI less like static software and more like an ongoing clinical intervention that requires surveillance throughout its life.

Monitoring by subgroup, not just overall

The most basic requirement is subgroup auditing.

Hospitals and regulators can track model performance separately across variables such as:

age
sex
ethnicity
disability status
geography
language background
hospital type
socioeconomic deprivation

Without this breakdown, systems can appear safe while harming specific populations.

The Lancet Digital Health and related fairness frameworks increasingly argue that transparency about who is represented in datasets and evaluations should continue after deployment, not stop at publication time. The Lancet [2EurekAlert!]eurekalert.orgnews releasesNew recommendations to increase transparency and…18 Dec 2024 — A new set of recommendations published in The Lancet Digital Health and…

Live monitoring illustration 2

Continuous drift detection

Modern monitoring systems increasingly attempt to identify changes before harm becomes obvious.

Researchers have proposed systems that track shifts in incoming data distributions, unusual prediction patterns, and changing relationships between variables in real time. Some methods can flag instability even before labelled clinical outcomes become available. [JAMA Network]jamanetwork.comJAMA NetworkDetecting and Remediating Harmful Data Shifts for…by V Subasri · 2025 · Cited by 45 — A proactive, label-agnostic monitori…

This matters because waiting for obvious failure may disproportionately harm populations already underserved by healthcare systems.

For example, if a diagnostic AI begins underperforming in a remote clinic due to different imaging conditions, automated drift detection may identify the problem weeks or months earlier than traditional retrospective audits.

Human escalation pathways

Monitoring only matters if somebody is responsible for acting on the results.

Researchers at Stanford and elsewhere increasingly emphasise governance structures where hospitals define in advance:

what metrics trigger concern
who reviews failures
when models are paused [dannoyes.com]dannoyes.comhealthcare ai validation the critical gap in post market monitoringHealthcare AI Validation: The Critical Gap in Post-Market…14 Jul 2025 — Despite rapid adoption, a significant majority of AI models (8…
how retraining occurs
how clinicians are informed
when regulators are notified

Without predefined escalation procedures, monitoring dashboards can become passive reporting systems that detect problems but do not prevent harm. [arXiv]arxiv.orgarXiv Monitoring Deployed AI Systems in Health CarearXiv Monitoring Deployed AI Systems in Health Care

Independent audits and external scrutiny

Hospitals and vendors may have incentives to emphasise successful metrics while overlooking edge-case failures. Independent auditing therefore matters for public trust.

The idea of a “medical algorithmic audit” has gained traction partly because external reviewers may be better placed to identify hidden inequities or inappropriate deployment contexts. [The Lancet]thelancet.comThe LancetTackling algorithmic bias and promoting transparency in…by JE Alderman · 2025 · Cited by 137 — Biases in AI health technolog…

This becomes increasingly important if future AI systems influence insurance decisions, specialist referrals, or access to expensive therapies. In those cases, monitoring is not merely technical quality control. It becomes part of the governance structure determining who receives care.

Why poorer health systems face the hardest monitoring challenge

Ironically, the healthcare systems most likely to benefit from scalable AI assistance may also be least equipped to monitor it properly.

Continuous oversight requires:

reliable digital infrastructure
high-quality data collection [bmjdigitalhealth.bmj.com]bmjdigitalhealth.bmj.comBMJ Digital Healthcase study on the relevance of data drift detectionConclusion By tracking both model performance and data drift, this s…
technical staff
statistical expertise
secure reporting pipelines
long-term maintenance budgets

Large academic hospitals may build sophisticated AI surveillance programmes. Smaller hospitals or lower-income countries may struggle to maintain even basic electronic record systems.

This creates a second-order inequality risk: wealthy institutions not only receive advanced AI first, but also gain the tools needed to detect failures early.

If the optimistic vision of AI-assisted medicine is to support broad human flourishing rather than deepen global disparities, monitoring systems themselves may need to become cheaper, more automated, and more widely shared.

Some researchers therefore argue for collaborative monitoring networks where hospitals pool anonymised performance information across regions and demographic groups. Shared reporting could make it easier to detect subgroup failures that would remain invisible inside a single institution. [LinkedIn]linkedin.comLinkedInThe Lancet Group's PostAddressing algorithmic bias in healthcare AI is essential to ensure that these technologies are inclusive…

Live monitoring illustration 3

The regulatory shift from approval to lifecycle oversight

Traditional medical devices are usually expected to remain stable after approval. AI systems are different because their environments constantly change and some systems may continue adapting after deployment.

Regulators are gradually responding by moving toward “total product lifecycle” oversight rather than one-time approval. [U.S. Food and Drug Administration]fda.govFood and Drug AdministrationEvaluating AI-enabled Medical Device Performance in…30 Sept 2025 — FDA is seeking information on best prac…

This shift matters because static approval models are poorly suited to dynamic systems.

Recent FDA consultations and guidance increasingly focus on:

post-market surveillance [dannoyes.com]dannoyes.comhealthcare ai validation the critical gap in post market monitoringHealthcare AI Validation: The Critical Gap in Post-Market…14 Jul 2025 — Despite rapid adoption, a significant majority of AI models (8…
predefined update procedures
ongoing validation
real-world performance evidence [fda.gov]fda.govArtificial Intelligence (AI) and Machine Learning (ML)22 Oct 2020 — Adaptive AI/ML technologies, which have the potential to adapt and op…
bias monitoring
reassessment triggers

The deeper implication is important. Medical AI cannot realistically be governed as if fairness were solved at launch. Equity becomes an ongoing operational responsibility.

Yet the monitoring gap remains substantial. One recent review noted that only a small minority of FDA-registered healthcare AI tools included detailed post-deployment surveillance plans. [arXiv]arxiv.orgarXiv Monitoring Deployed AI Systems in Health CarearXiv Monitoring Deployed AI Systems in Health Care

That gap between deployment speed and monitoring capacity may become one of the defining governance problems of large-scale medical AI.

Monitoring as a condition for AI-enabled health abundance

The optimistic case for advanced AI in medicine is enormous: earlier disease detection, faster drug discovery, personalised treatment, reduced diagnostic shortages, and potentially dramatic gains in healthy lifespan.

But these gains only support broad human flourishing if they remain reliable across real populations and over long periods of time.

Monitoring is therefore not a peripheral safety feature. It is part of the social infrastructure that determines whether AI-assisted medicine scales fairly.

A future in which AI dramatically improves health for wealthy populations while poorer groups receive degraded, weakly supervised systems would not represent the kind of “AI bloom” envisioned by technological optimists. It would be a more automated version of existing inequality.

The stronger version of the optimistic future depends on something more demanding: systems that improve while remaining accountable, transparent, and continuously tested against the diversity of human lives they are supposed to serve. [Ovid]ovid.comOvidArtificial intelligence and global health equity: BMJby RG Dychiao · 2024 · Cited by 18 — Continuous post-deployment monitoring and… [PMC]pmc.ncbi.nlm.nih.govPMCKeeping Medical AI Healthy and Trustworthy: A ReviewPMCby H Guan · 2025 · Cited by 25 — This work aims to guide the development of reliable, robust medical AI systems capable of sustaining… [The Lancet]facebook.comexperts have developed recommendations to increase transparency and tackle poteThe Lancet8 Jan 2025 — The ethics of artificial intelligence in healthcare involves navigating issues like patient privacy, algorithm bia…

Endnotes

Source: pmc.ncbi.nlm.nih.gov
Title: PMCKeeping Medical AI Healthy and Trustworthy: A Review
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC13050583/
Source snippet
PMCby H Guan · 2025 · Cited by 25 — This work aims to guide the development of reliable, robust medical AI systems capable of sustaining...
Source: ovid.com
Link: https://www.ovid.com/journals/bmjd/fulltext/10.1136/bmj.q2194~artificial-intelligence-and-global-health-equity
Source snippet
OvidArtificial intelligence and global health equity: BMJby RG Dychiao · 2024 · Cited by 18 — Continuous post-deployment monitoring and...
Source: fda.gov
Link: https://www.fda.gov/medical-devices/digital-health-center-excellence/request-public-comment-measuring-and-evaluating-artificial-intelligence-enabled-medical-device
Source snippet
Food and Drug AdministrationEvaluating AI-enabled Medical Device Performance in...30 Sept 2025 — FDA is seeking information on best prac...
Source: fda.gov
Title: artificial intelligence software medical device
Link: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-software-medical-device
Source snippet
Food and Drug AdministrationArtificial Intelligence in Software as a Medical DeviceMar 25, 2025 — AI/ML technologies have the potential t...
Source: fda.gov
Link: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/artificial-intelligence-enabled-device-software-functions-lifecycle-management-and-marketing
Source snippet
Food and Drug AdministrationArtificial Intelligence-Enabled Device Software FunctionsJan 7, 2025 — This draft guidance document provides...
Source: arxiv.org
Title: arXiv Monitoring Deployed AI Systems in Health Care
Link: https://arxiv.org/abs/2512.09048
Source: arxiv.org
Link: https://arxiv.org/abs/2506.05701
Source: arxiv.org
Link: https://arxiv.org/abs/2202.02833
Source snippet
arXivCheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AIFebruary 6, 2022...

Published: February 6, 2022
Source: eurekalert.org
Title: news releases
Link: https://www.eurekalert.org/news-releases/1068414
Source snippet
New recommendations to increase transparency and...18 Dec 2024 — A new set of recommendations published in The Lancet Digital Health and...
Source: arxiv.org
Link: https://arxiv.org/abs/2506.17442
Source: linkedin.com
Link: https://www.linkedin.com/posts/the-lancet_experts-have-developed-recommendations-activity-7282804003283861506-yT1d
Source snippet
LinkedInThe Lancet Group's PostAddressing algorithmic bias in healthcare AI is essential to ensure that these technologies are inclusive...
Source: fda.gov
Link: https://www.fda.gov/media/151482/download
Source snippet
Artificial Intelligence (AI) and Machine Learning (ML)22 Oct 2020 — Adaptive AI/ML technologies, which have the potential to adapt and op...
Source: arxiv.org
Link: https://arxiv.org/html/2506.05701v1
Source snippet
Statistically Valid Post-Deployment Monitoring Should Be...6 Jun 2025 — This position paper argues that post-deployment monitoring in cl...
Source: linkedin.com
Link: https://www.linkedin.com/pulse/fda-requests-public-comment-measuring-evaluating-device-colangelo-mzzkf
Source snippet
ies, and approaches for measuring and evaluating real world...Read more...
Source: thelancet.com
Link: https://www.thelancet.com/journals/landig/article/PIIS2589-7500%2824%2900224-3/fulltext
Source snippet
The LancetTackling algorithmic bias and promoting transparency in...by JE Alderman · 2025 · Cited by 137 — Biases in AI health technolog...
Source: jamanetwork.com
Link: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2834882
Source snippet
JAMA NetworkDetecting and Remediating Harmful Data Shifts for...by V Subasri · 2025 · Cited by 45 — A proactive, label-agnostic monitori...
Source: thelancet.com
Link: https://www.thelancet.com/journals/landig/article/PIIS2589-7500%2825%2900139-6/fulltext?rss=yes
Source snippet
The Lancettackling bias, inequity, and implementation challengesby ML Welch — Biases in health-care AI-based solutions arise from social...
Source: thelancet.com
Link: https://www.thelancet.com/journals/landig/article/PIIS2589-7500%2822%2900003-6/fulltext
Source snippet
The LancetThe medical algorithmic auditby X Liu · 2022 · Cited by 270 — The medical algorithmic audit is a tool that can be used to bette...
Source: thelancet.com
Link: https://www.thelancet.com/journals/landig/article/PIIS2589-7500%2824%2900155-9/fulltext
Source snippet
Mitigating the risk of artificial intelligence bias in...by A Mihan · 2024 · Cited by 51 — We propose strategies that can be applied dur...
Source: thelancet.com
Link: https://www.thelancet.com/cms/10.1016/j.landig.2025.100957/attachment/fafc79b2-1c64-4c6a-aad5-7dbd79e1e727/mmc1.pdf
Source snippet
Supplementary appendixA key component of responsible AI development and deployment is mitigating bias, fairness and inequity. In order to...
Source: spin.atomicobject.com
Title: drift monitoring health ai
Link: https://spin.atomicobject.com/drift-monitoring-health-ai/
Source snippet
atomicobject.comReal‑World Evaluation & Drift Monitoring for Health AI22 Jan 2026 — Healthcare is adopting AI quickly, but expectations c...
Source: facebook.com
Title: experts have developed recommendations to increase transparency and tackle pote
Link: https://www.facebook.com/TheLancetMedicalJournal/posts/-experts-have-developed-recommendations-to-increase-transparency-and-tackle-pote/1031677509003845/
Source snippet
The Lancet8 Jan 2025 — The ethics of artificial intelligence in healthcare involves navigating issues like patient privacy, algorithm bia...

Additional References

Source: publichealthaihandbook.com
Link: https://publichealthaihandbook.com/implementation/deployment.html
Source snippet
AI Deployment in Healthcare: Why Most Prototypes FailOnly 53% of AI projects reach production, and healthcare rates are even lower. Pract...
Source: bmjdigitalhealth.bmj.com
Link: https://bmjdigitalhealth.bmj.com/content/1/1/e000046.full.pdf
Source snippet
BMJ Digital Healthcase study on the relevance of data drift detectionConclusion By tracking both model performance and data drift, this s...
Source: fenwick.com
Link: https://www.fenwick.com/insights/publications/fda-requests-public-comment-on-how-to-measure-and-manage-performance-of-ai-enabled-medical-devices
Source snippet
FDA Requests Public Comment on How to Measure and...9 Oct 2025 — The FDA has requested comments on current, practical ways to measure an...
Source: crowell.com
Link: https://www.crowell.com/en/insights/client-alerts/fda-seeks-input-on-real-world-performance-of-ai-enabled-medical-devices-what-biotech-and-medtech-innovators-need-to-know
Source snippet
FDA Seeks Input on Real-World Performance of AI...Oct 3, 2025 — FDA is soliciting public comment on practical strategies for measuring a...
Source: learn.hms.harvard.edu
Title: ai implications health equity shaping future health care quality and safety
Link: https://learn.hms.harvard.edu/insights/all-insights/ai-implications-health-equity-shaping-future-health-care-quality-and-safety
Source snippet
Implications for Health Equity: Shaping the Future of...7 Apr 2025 — AI is rapidly transforming health care, with its potential to impro...
Source: youtube.com
Link: https://www.youtube.com/watch?v=bcqofACB-Sk
Source snippet
The goal of the meeting is to discuss how biases arise in healthcare artificial intelligence the tools and strategies that can hel...
Source: dannoyes.com
Title: healthcare ai validation the critical gap in post market monitoring
Link: https://dannoyes.com/healthcare-ai-validation-the-critical-gap-in-post-market-monitoring/
Source snippet
Healthcare AI Validation: The Critical Gap in Post-Market...14 Jul 2025 — Despite rapid adoption, a significant majority of AI models (8...
Source: bipartisanpolicy.org
Title: fda oversight understanding the regulation of health ai tools
Link: https://bipartisanpolicy.org/issue-brief/fda-oversight-understanding-the-regulation-of-health-ai-tools/
Source snippet
FDA Oversight: Understanding the Regulation of Health AI...10 Nov 2025 — The FDA regulates AI-enabled medical devices through a risk-bas...
Source: jyi.org
Link: https://www.jyi.org/2026-january-1/2026/1/8/bias-in-medical-ai-algorithmic-fairness-and-ethics-challenges
Source snippet
ate treatment, disproportionately affecting marginalized populations (Hanna...Read more...
Source: globalforum.diaglobal.org
Title: risk based monitoring for ai enabled medical devices
Link: https://globalforum.diaglobal.org/issue/september-2025/risk-based-monitoring-for-ai-enabled-medical-devices/
Source snippet
diaglobal.orgRisk-Based Monitoring for AI-Enabled Medical DevicesNotably, FDA guidance insists that sponsors must proactively identify cr...