Clean data

Introduction

Public-sector AI usually fails for a mundane reason: governments do not know which records refer to the same person, address, company, payment, or event. Agencies may each hold useful data, but if formats differ, identifiers conflict, records are incomplete, or systems cannot securely exchange information, even advanced AI produces unreliable answers. The limiting factor is often not model intelligence but administrative coherence.

Clean data illustration 1 That is why clean public data increasingly matters as a form of state capacity. Accurate, standardised, shareable records allow institutions to coordinate around the same facts. AI systems can then help officials detect fraud, identify vulnerable households during disasters, connect patients across health services, or route cases through overstretched bureaucracies. Without those foundations, governments remain trapped in fragmented workflows where each department sees only part of reality.

The broader significance reaches beyond administrative efficiency. If advanced AI is eventually to help societies manage complex healthcare systems, accelerate science, adapt infrastructure, or coordinate responses to climate and demographic pressures, states will need trustworthy information systems capable of turning machine intelligence into collective action rather than isolated pilot projects.

Why fragmented records break AI systems

Many public-sector AI projects fail long before the model itself becomes the problem. Governments often operate across dozens or hundreds of incompatible databases created by different departments, contractors, or legal regimes over decades. Addresses are entered differently across systems. Citizens appear under multiple identifiers. Medical terminology varies between hospitals. Procurement records may not match tax databases. Agencies cannot reliably exchange information in real time.

AI systems trained on this environment inherit the fragmentation. A fraud model cannot detect suspicious patterns if duplicate suppliers appear under slightly different names. A healthcare triage system cannot build a coherent patient history if records from clinics, hospitals, and social care providers cannot be linked safely and consistently. A disaster-response system cannot identify vulnerable residents quickly if housing, disability, and emergency-contact records remain isolated.

The UK government’s National Data Strategy identified non-standardisation and weak coordination as major obstacles to effective public services, warning that data collected by one organisation often cannot be used easily by another. [Data Parliament]data.parliament.ukData ParliamentGOV.UK - National Data StrategySeptember 9, 2020 — 9 Sept 2020 — Non-standardisation and a lack of coordination on data me…Published: September 9, 2020 The European Commission’s Joint Research Centre similarly argued that interoperability is a prerequisite for effective AI in the public sector because machine systems depend on compatible formats, shared definitions, and reliable exchange mechanisms. [JRC Publications]publications.jrc.ec.europa.euJRC PublicationsArtificial Intelligence for Interoperability in the European Public…October 4, 2023 — by L TANGI · Cited by 9 — This p…Published: October 4, 2023

This sounds technical, but the practical consequences are human. Citizens repeat the same information to multiple agencies. Benefits are delayed because departments cannot verify circumstances. Fraud investigations stall because suspicious transactions cannot be linked across systems. Health interventions arrive too late because risk indicators remain scattered between institutions.

In this sense, fragmented data is not merely an IT problem. It is a coordination failure embedded inside the state itself.

Address matching shows the hidden mechanics of state capacity

One of the clearest examples is address matching: the seemingly simple task of determining whether two differently written addresses refer to the same place.

Humans solve this easily. Bureaucracies often cannot.

A single property may appear in government systems with abbreviations, spelling variations, missing postcodes, old street names, or formatting inconsistencies. AI systems trained on raw data can misclassify these records as separate locations. That weakens everything built on top of them: census operations, tax enforcement, emergency response, electoral administration, benefits delivery, infrastructure planning, and fraud detection.

The UK Office for National Statistics developed machine-learning methods specifically to improve address matching because conventional exact-match systems struggled with messy real-world records. [Office for National Statistics]ons.gov.ukOffice for National StatisticsUsing data science for the address matching serviceThis paper describes the methodology which underpins the… The importance of this work lies less in the algorithm itself than in the institutional consequence. Once agencies can reliably identify the same household across datasets, they can coordinate services and detect inconsistencies automatically.

This becomes especially important in crises. During floods, heatwaves, or pandemics, governments need to know quickly which people live where, which households contain medically vulnerable residents, and which infrastructure assets are affected. AI-assisted coordination becomes far more useful when agencies share standardised location data rather than isolated spreadsheets.

Address systems also illustrate a broader point about AI abundance and long-term flourishing. Advanced intelligence is only transformative when civilisation can connect knowledge to action. Clean foundational data acts as infrastructure for collective intelligence in much the same way roads or electricity grids support economic activity.

Health records reveal both the promise and the bottleneck

Healthcare provides perhaps the strongest argument for why data quality matters more than model sophistication.

Modern medicine produces enormous amounts of information: scans, prescriptions, laboratory results, genomic data, referrals, social-care notes, wearable-device outputs, and insurance records. AI systems can potentially identify disease risks earlier, improve diagnosis, personalise treatment, and accelerate research. But those capabilities depend heavily on whether records can be linked accurately and safely.

The OECD has argued that public-sector health systems need stronger data availability, quality, linkage capability, and privacy protections to realise the benefits of AI-driven healthcare and research. [one.oecd.org]one.oecd.orgC(2022)25 - Login - OECDFebruary 23, 2022 — 2 Feb 2022 — Adherents are asked to review the capacity of public sector health data systems…Published: February 23, 2022 The proposed European Health Data Space emerged partly because fragmented national systems make it difficult to build interoperable health AI tools across institutions and borders. [OECD]one.oecd.orgC(2022)25 - Login - OECDFebruary 23, 2022 — 2 Feb 2022 — Adherents are asked to review the capacity of public sector health data systems…Published: February 23, 2022

Even basic patient matching remains difficult. NHS guidance on data linkage stresses that reliable matching often depends on consistent identifiers such as NHS numbers combined with high-quality structured clinical data. [healtheconomicsunit.nhs.uk]healtheconomicsunit.nhs.ukA guide to data linkageJanuary 14, 2022 — • For health records, it is recommended to start first by matching through the NHS number if po…Published: January 14, 2022 If those foundations are weak, AI systems can confuse patients, miss risk signals, or produce misleading predictions.

The consequences are larger than administrative inconvenience. Preventive medicine increasingly depends on longitudinal understanding: seeing how conditions evolve across years and institutions. An AI system helping detect cancer risk or medication complications may need to integrate primary care, hospital, pharmacy, and social-care information. Fragmented systems prevent that holistic view.

This matters for the wider AI bloom question because healthcare is one of the main domains where optimistic visions of AI become concrete. Radical longevity gains, accelerated drug discovery, and personalised medicine all depend not only on scientific breakthroughs but also on societies capable of organising and sharing trusted health information at scale.

Yet healthcare also exposes the tensions. Linking records increases privacy risks. Centralisation can create surveillance concerns. Data-sharing rules differ across jurisdictions. Citizens may lose trust if systems appear opaque or insecure. The challenge is therefore not simply collecting more data, but creating governance systems that allow trustworthy coordination without sacrificing civil liberties.

Fraud detection is often the first real test

Fraud analytics has become one of the most practical demonstrations of how linked public data increases state capability.

Governments lose vast sums through duplicate payments, identity fraud, procurement manipulation, false claims, and organised abuse of public systems. Traditional investigations are labour-intensive and reactive. AI-assisted analytics allows agencies to detect unusual patterns across large transaction networks far earlier.

But this only works when records can be connected.

The UK National Audit Office describes modern fraud analytics as heavily dependent on data matching, network analysis, and verification across datasets. [National Audit Office (NAO]nao.org.ukNational Audit Office (NAO)Using data analytics to tackle fraud and errorUsing data matching to find links between companies or individuals, to identify potential patterns and indicators of fraud –…Read more… Systems may compare supplier databases, payment histories, company registrations, benefit claims, or banking details to identify suspicious relationships invisible to isolated departments.

The National Fraud Initiative already uses computerised data matching across public bodies to identify inconsistencies requiring investigation. [Leeds]leeds.gov.ukLeedsNational fraud initiative | Leeds.gov.ukComputerised data matching allows potentially fraudulent claims and payments to be identifie… Increasingly, AI techniques are being layered on top of these linked datasets to prioritise risky cases and uncover hidden patterns.

Importantly, fraud work demonstrates an intermediate stage of AI-enabled governance. Governments do not need artificial general intelligence to benefit. Even relatively narrow systems can produce substantial gains if the underlying data environment is coherent enough.

The political implications are significant. States that can reliably detect leakage and corruption may recover resources for healthcare, infrastructure, or education. Citizens may experience more competent institutions. Public trust can increase if governments appear capable of administering systems fairly and effectively.

Yet there are risks as well. Poorly linked records can falsely flag innocent people. Biased historical data can amplify unequal scrutiny. Aggressive analytics may expand state surveillance without adequate safeguards. Fraud systems therefore illustrate both the capability gains and the governance dilemmas of AI-enabled administration.

Clean data illustration 2

Shared standards matter more than giant central databases

A common misconception is that effective public-sector AI requires building one enormous national database. In practice, many successful systems rely instead on interoperability: allowing different institutions to exchange trusted information securely while retaining separate control of their own records.

Estonia’s X-Road system is one of the best-known examples. Rather than centralising all government data into a single repository, X-Road creates a secure exchange layer that allows agencies to communicate using shared protocols and authentication systems. [e-Estonia]e-estonia.comX-road – Interoperability servicese-EstoniaX-road – Interoperability servicesJune 10, 2024 — X-Road is a secure data exchange layer for sending and receiving data between…Published: June 10, 2024 [Observatory of Public Sector Innovation]oecd-opsi.orgx road trust federation for cross border data exchangeObservatory of Public Sector InnovationX-Road Trust Federation for Cross-border Data Exchange31 Aug 2021 — Exploring how governments can…

This architecture matters because modern states are inherently distributed. Health systems, tax authorities, municipalities, courts, schools, and regulators all maintain specialised records. The challenge is not forcing every institution into one database, but enabling coordination without chaos.

X-Road’s significance is therefore institutional rather than merely technical. It demonstrates how standardised interfaces, identity systems, logging mechanisms, and governance rules can allow AI tools to operate across administrative boundaries. Estonia reports billions of annual transactions through the system and substantial reductions in duplicated administrative work. [e-Estonia]e-estonia.comX-road – Interoperability servicese-EstoniaX-road – Interoperability servicesJune 10, 2024 — X-Road is a secure data exchange layer for sending and receiving data between…Published: June 10, 2024

The broader lesson is that standards create scalability. A useful pilot in one ministry becomes reusable across government when data structures, metadata rules, identifiers, and exchange protocols are shared. Without standards, every agency builds bespoke tools that cannot communicate with each other.

Recent UK guidance on “AI-ready datasets” reflects this shift in thinking, emphasising interoperability, governance, contextual completeness, and ongoing stewardship rather than merely collecting large quantities of information. [GOV.UK Assets]assets.publishing.service.gov.ukUK Assets Guidelines and best practices for making governmentAssetsGuidelines and best practices for making government…January 19, 2026 — 1 Jan 2026 — An AI-ready dataset is not defined solely by…Published: January 19, 2026

This may sound disappointingly bureaucratic compared with futuristic AI narratives. But civilisation-scale intelligence depends on exactly these coordination layers. Scientific abundance, resilient infrastructure, and responsive public services all require institutions capable of sharing trusted information across complex systems.

Clean data changes what governments can coordinate

When public records become accurate, interoperable, and machine-readable, governments gain new forms of operational awareness.

That can mean:

Faster disaster response because infrastructure, demographic, and geographic records connect in real time.
Earlier disease intervention because healthcare systems can detect patterns across institutions.
Better urban planning because transport, housing, energy, and environmental datasets become combinable.
More effective welfare administration because agencies can verify changing circumstances without repeatedly burdening citizens.
More reliable scientific policy because governments can integrate data from laboratories, hospitals, universities, and regulators.

The key shift is from isolated administration to coordinated systems management.

This matters for long-term human flourishing because many future challenges are coordination problems at civilisational scale. Climate adaptation, ageing populations, pandemic response, energy transitions, biosecurity, and scientific acceleration all require institutions capable of integrating vast amounts of information into coherent action.

AI may eventually supply extraordinary analytical power. But analysis alone is insufficient if states cannot trust, share, or operationalise the underlying data.

In that sense, clean public data functions as a form of societal memory and shared perception. It allows institutions to “see” more clearly across bureaucratic boundaries. AI then becomes not merely a productivity tool, but a mechanism for increasing collective intelligence.

Clean data illustration 3

The danger of building smarter systems on broken foundations

There is also a strong cautionary lesson here. AI can magnify institutional weaknesses as easily as institutional strengths.

If records are biased, incomplete, or outdated, AI systems may automate confusion at scale. Incorrect addresses can deny services. Faulty health linkage can misclassify patients. Inconsistent policing data can reinforce discriminatory patterns. Poorly governed interoperability can expose citizens to surveillance or cyber risks.

This is why many public-sector AI failures ultimately become data-governance stories rather than model-performance stories.

The temptation for governments is to pursue visible AI deployments before fixing underlying infrastructure. Large language models make this temptation stronger because they produce impressive demonstrations quickly. But impressive interfaces can conceal fragile administrative foundations underneath.

The more ambitious the vision of AI-enabled governance becomes, the more important those foundations are likely to become. Advanced AI may eventually help societies coordinate scientific research, optimise energy systems, or manage complex public-health interventions. Yet those capabilities will depend heavily on whether institutions possess trustworthy data environments and governance systems robust enough to support them.

The future of state capacity may therefore look less like a single superintelligent machine governing society and more like thousands of interoperable systems helping institutions coordinate around reality more effectively.

That is a less dramatic vision than science fiction often promises. But it may be much closer to how an AI-enabled civilisation actually becomes more capable.

Endnotes

Source: data.parliament.uk
Link: https://data.parliament.uk/DepositedPapers/Files/DEP2020-0521/UK_National_Data_Strategy.pdf
Source snippet
Data ParliamentGOV.UK - National Data StrategySeptember 9, 2020 — 9 Sept 2020 — Non-standardisation and a lack of coordination on data me...

Published: September 9, 2020
Source: GOV.UK
Title: national data strategy
Link: https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy
Source snippet
Data Strategy9 Dec 2020 — It seeks to harness the power of data to boost productivity, create new businesses and jobs, improve public ser...
Source: ons.gov.uk
Link: https://www.ons.gov.uk/methodology/methodologicalpublications/generalmethodology/onsworkingpaperseries/onsworkingpaperseriesno17usingdatasciencefortheaddressmatchingservice
Source snippet
Office for National StatisticsUsing data science for the address matching serviceThis paper describes the methodology which underpins the...
Source: one.oecd.org
Link: https://one.oecd.org/document/C%282022%2925/en/pdf
Source snippet
C(2022)25 - Login - OECDFebruary 23, 2022 — 2 Feb 2022 — Adherents are asked to review the capacity of public sector health data systems...

Published: February 23, 2022
Source: oecd.org
Title: overview 6ef2d8f8
Link: https://www.oecd.org/en/publications/progress-in-implementing-the-european-union-coordinated-plan-on-artificial-intelligence-volume-2_3ac96d41-en/full-report/overview_6ef2d8f8.html
Source snippet
OECDOverview: Progress in Implementing the European Union...18 Feb 2026 — The proposed European Health Data Space (EHDS) is an important...
Source: healtheconomicsunit.nhs.uk
Link: https://healtheconomicsunit.nhs.uk/wp-content/uploads/2022/02/Data-linkage-guide_January-2022.pdf
Source snippet
A guide to data linkageJanuary 14, 2022 — • For health records, it is recommended to start first by matching through the NHS number if po...

Published: January 14, 2022
Source: committees.parliament.uk
Link: https://committees.parliament.uk/publications/52403/documents/290832/default/
Source snippet
UK Parliament CommitteesGovernment use of data analytics on error and fraud27 Mar 2026 — The Public Sector Fraud Authority should review...
Source: leeds.gov.uk
Link: https://www.leeds.gov.uk/performance-and-spending/national-fraud-data-matching-initiative
Source snippet
LeedsNational fraud initiative | Leeds.gov.ukComputerised data matching allows potentially fraudulent claims and payments to be identifie...
Source: e-estonia.com
Title: X-road – Interoperability services
Link: https://e-estonia.com/solutions/interoperability-services/x-road/
Source snippet
e-EstoniaX-road – Interoperability servicesJune 10, 2024 — X-Road is a secure data exchange layer for sending and receiving data between...

Published: June 10, 2024
Source: assets.publishing.service.gov.uk
Title: UK Assets Guidelines and best practices for making government
Link: https://assets.publishing.service.gov.uk/media/696e43965a37ab534a9e23ac/Building_AI-Ready_Datasets_for_the_UK.pdf
Source snippet
AssetsGuidelines and best practices for making government...January 19, 2026 — 1 Jan 2026 — An AI-ready dataset is not defined solely by...

Published: January 19, 2026
Source: GOV.UK
Title: guidelines and best practices for making government datasets ready for ai
Link: https://www.gov.uk/government/publications/making-government-datasets-ready-for-ai/guidelines-and-best-practices-for-making-government-datasets-ready-for-ai
Source snippet
and best practices for making government...19 Jan 2026 — This document provides the initial guidelines for releasing government datasets...
Source: assets.publishing.service.gov.uk
Link: https://assets.publishing.service.gov.uk/media/5d1a294240f0b609e0f06b0e/Tackling_fraud_in_government_with_data_analytics.pdf
Source snippet
fraud in Government with data analyticsThe UK Government has taken a proactive approach to addressing fraud, focusing on building capabil...
Source: oecd.org
Title: ai in public service design and delivery 09704c1a
Link: https://www.oecd.org/en/publications/2025/06/governing-with-artificial-intelligence_398fa287/full-report/ai-in-public-service-design-and-delivery_09704c1a.html
Source snippet
AI in public service design and delivery: Governing with...18 Sept 2025 — The development and use of AI has permeated public service des...
Source: digitaltrade.blog.gov.uk
Title: data governance and security not optional anymore
Link: https://digitaltrade.blog.gov.uk/2025/10/30/data-governance-and-security-not-optional-anymore/
Source snippet
governance and security: Not optional anymore30 Oct 2025 — This article highlights key themes, challenges and opportunities around data g...
Source: x-road.global
Link: https://x-road.global/xroad-case-studies-library
Source snippet
X-Road Case Studies LibraryKnown locally as X-Via, the secure data exchange platform is enhancing interoperability and transforming publi...
Source: cfa.nhs.uk
Title: engineering clean reliable data to help detect NHS fraud
Link: https://cfa.nhs.uk/about-nhscfa/corporate-projects/project-athena/project-athena-news/engineering-clean-reliable-data-to-help-detect-NHS-fraud
Source snippet
Engineering clean, reliable data to help detect NHS fraud28 May 2025 — You can report NHS fraud securely and confidentiality by calling 0...

Published: May 2025
Source: publications.jrc.ec.europa.eu
Link: https://publications.jrc.ec.europa.eu/repository/bitstream/JRC134713/JRC134713_01.pdf
Source snippet
JRC PublicationsArtificial Intelligence for Interoperability in the European Public...October 4, 2023 — by L TANGI · Cited by 9 — This p...

Published: October 4, 2023
Source: nao.org.uk
Title: National Audit Office (NAO)Using data analytics to tackle fraud and error
Link: https://www.nao.org.uk/wp-content/uploads/2025/07/using-data-analytics-to-tackle-fraud-and-error.pdf
Source snippet
Using data matching to find links between companies or individuals, to identify potential patterns and indicators of fraud –...Read more...
Source: oecd-opsi.org
Title: x road trust federation for cross border data exchange
Link: https://oecd-opsi.org/innovations/x-road-trust-federation-for-cross-border-data-exchange/
Source snippet
Observatory of Public Sector InnovationX-Road Trust Federation for Cross-border Data Exchange31 Aug 2021 — Exploring how governments can...

Additional References

Source: businessatoecd.org
Link: https://www.businessatoecd.org/hubfs/AI%20for%20Health.pdf
Source snippet
AI for HealthThis paper demonstrates that Artificial. Intelligence (AI) has transformative potential in the health sector in different us...
Source: blog.govnet.co.uk
Link: https://blog.govnet.co.uk/fraud/beyond-fraud-using-data-analytics-to-tackle-waste-inefficiency-and-abuse-in-the-nhs
Source snippet
Fraud: Using Data Analytics to Tackle Waste...30 Apr 2026 — Three local authority counter fraud practitioners discuss the fraud landscap...
Source: citp.ac.uk
Title: interoperability of data governance regimes challenges for digital trade policy
Link: https://citp.ac.uk/publications/interoperability-of-data-governance-regimes-challenges-for-digital-trade-policy
Source snippet
Interoperability of Data Governance Regimes: Challenges...8 Apr 2024 — In this Briefing Paper, we focus on understanding what the intero...
Source: ntouk.wordpress.com
Title: ai data and public services
Link: https://ntouk.wordpress.com/2023/12/29/ai-data-and-public-services/
Source snippet
wordpress.comAI, data, and public services29 Dec 2023 — “The adoption of cross-government standards for metadata and interoperability to...
Source: youtube.com
Title: Co Motion MIAMI ‘26
Link: https://www.youtube.com/watch?v=mStBo9qf4RA
Source snippet
The Policy Fix | How to make the UK a world leader in public sector AI...
Source: cms.law
Link: https://cms.law/en/int/legal-updates/smart-data-2035-the-uk-government-publishes-its-smart-data-strategy
Source snippet
Smart Data 2035: The UK government publishes its...21 Apr 2026 — A central feature of the Strategy is the government's commitment to cro...
Source: government-transformation.com
Link: https://www.government-transformation.com/data/foundational-data-an-interoperability-enabler
Source snippet
Foundational data: An interoperability enabler22 Sept 2025 — A good data vendor partner can be a valuable paid source of high-quality, ve...
Source: youtube.com
Title: The Policy Fix | How to make the UK a world leader in public sector AI
Link: https://www.youtube.com/watch?v=kmZHuOrmWhE
Source snippet
Driving Public Sector Innovation with High-Quality Data and AI...
Source: youtube.com
Title: Building State Capacity for AI in Government
Link: https://www.youtube.com/watch?v=H7T5K9b-Aec
Source snippet
CoMotion MIAMI '26 - AI in Government: From Experimentation to Everyday Operations...
Source: x-road-document-library.s3.amazonaws.com
Link: https://x-road-document-library.s3.amazonaws.com/attachments/A_historical_analysis_on_innteroperability_in_Estonian_data_exchange_architecture.pdf
Source snippet
PERSPECTIVES FROM THE PAST AND FOR THE FUTUREby EB Jackson · Cited by 17 — In Estonia, the interoperability data exchange platform, X-Roa...