Confidence gaps

Introduction

AlphaFold 3 can generate molecular interaction models that look remarkably precise. Proteins appear neatly docked to drugs, antibodies seem to fit their targets cleanly, and confidence maps often suggest that the AI “knows” the answer. That visual authority is part of why the system has generated such excitement in drug discovery and structural biology. But one of the most important lessons from early benchmarking is that confidence scores are not the same thing as biological truth.

Confidence gaps illustration 1 In some cases, AlphaFold 3 assigns high confidence to structures that later turn out to contain major errors, especially in flexible proteins, novel interaction types, poorly represented chemistries, or systems that can adopt multiple conformations. Researchers increasingly treat the model not as an oracle, but as a fast hypothesis generator whose outputs still require experimental testing. That distinction matters well beyond one AI tool. If advanced AI is to accelerate medicine and scientific discovery as part of a broader “AI bloom”, scientists need systems that not only produce persuasive answers, but also communicate uncertainty honestly.

What confidence scores appear to promise

AlphaFold 3 inherited and extended several confidence metrics from earlier AlphaFold systems. The best known is pLDDT, a score estimating confidence in local atomic positions on a 0–100 scale. High values are intended to indicate that the predicted structure is likely reliable. The system also uses metrics such as predicted aligned error (PAE) and interface confidence estimates for molecular interactions. EMBL-EBI guidance describes pLDDT values above 90 as highly confident, while values below 50 often indicate unreliable regions. [ebi.ac.uk]ebi.ac.ukIt uses a 0-100 scale, where higher values indicate higher confidence.Read more…

These metrics are genuinely useful. AlphaFold’s success partly came from the fact that its confidence estimates often correlate reasonably well with experimental accuracy. Researchers can rapidly identify likely stable regions and deprioritise obviously weak predictions. That ability dramatically reduces wasted effort compared with blind modelling.

The problem is subtler: confidence scores can appear more trustworthy than they really are.

AlphaFold 3 produces detailed atomic models with smooth geometries, chemically plausible interactions and colour-coded certainty maps. For non-specialists, and sometimes even specialists, this creates a strong psychological impression that the model has “solved” the interaction. In reality, the confidence metric only estimates how self-consistent the prediction is relative to patterns learned during training. It does not independently verify that the biology is correct.

That distinction becomes especially important in exactly the kinds of frontier problems that matter most for future drug discovery: novel targets, flexible binding states, intrinsically disordered proteins, allosteric regulation and unusual chemistries.

Where benchmarking finds hidden structural errors

Independent benchmarking studies have repeatedly found cases where AlphaFold 3 looks confident while still producing materially incorrect structures.

A 2025 benchmarking study across multiple biomolecular datasets found that although AlphaFold 3 improved local structural accuracy over AlphaFold 2, gains in global accuracy were often limited and some interaction categories remained difficult. Protein multimers, antibody-antigen systems and nucleic-acid-related interactions still showed significant weaknesses. [OUP Academic]academic.oup.comOUP Academiccomprehensive benchmarking of the AlphaFold3 for predicting…by C Peng · 2025 · Cited by 7 — In this work, we benchmark Alp… [PMC]nih.govPMCA comprehensive benchmarking of the Alpha Fold3 forPMCby C Peng · 2025 · Cited by 6 — In this work, we benchmark AlphaFold3's performance across nine datasets, protein monomers, orphan pro…

Another evaluation using the SKEMPI protein interaction database found that some AlphaFold 3 complex structures contained “large errors” not captured by the model’s interface confidence metrics. The same study concluded that predictions involving intrinsically flexible regions or domains were not reliably assessed by the confidence system. [ACS Publications]pubs.acs.orgACS PublicationsEvaluation of AlphaFold 3's Protein–Protein Complexes for…by JJ Wee · 2024 · Cited by 79 — In this work, we evaluate A…

This matters because flexibility is not a minor edge case in biology. Many proteins only become functional when changing shape. Others fluctuate between multiple biologically relevant conformations. Drug molecules may stabilise one state while suppressing another. Immune recognition often depends on transient geometries rather than rigid lock-and-key fits.

AlphaFold 3, however, still tends to output a single dominant structure. EMBL-EBI training materials explicitly note that the system predicts static structures and does not fully capture the dynamic behaviour of molecules in solution. [ebi.ac.uk]ebi.ac.ukWhat AlphaFold 3 struggles withA key limitation of protein structure prediction models is that they typically predict static structures a…

In practice, this can produce a dangerous combination:

a visually convincing structure
a high confidence score [x.com]x.comAn Evaluation of Biomolecular Energetics Learned by…Importantly, AlphaFold's confidence scores (pLDDT) were high even for residues wit…
but an incorrect biological interpretation

Researchers studying autoinhibited proteins — proteins that switch between inactive and active forms — found that AlphaFold systems struggled with conformational diversity because training data over-represent stable states captured in crystallographic databases. [Nature]nature.comNatureBenchmarking all-atom biomolecular structure prediction…by S Xu · 2025 · Cited by 21 — We find that AlphaFold 3 leads overall, y…

Similarly, studies on proteins with alternative folds have argued that AlphaFold can produce high-confidence predictions that contradict experimental evidence when proteins adopt unusual or underrepresented conformations. [arXiv]arxiv.orgarXivBenchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy chang…

Flexible and disordered biology remains especially hard

One recurring failure mode involves intrinsically disordered regions: protein segments that do not maintain one stable structure.

These regions are central to cell signalling, gene regulation and disease biology. They are also unusually difficult for AI structure predictors because they violate the assumption that one sequence maps cleanly to one stable shape.

AlphaFold often marks disordered regions with low confidence, which can be useful. But newer studies suggest the opposite problem can also occur: the system may generate highly confident structures for regions that are experimentally known to remain disordered. [arXiv]arxiv.orgarXivBenchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy chang…

That creates a serious interpretive risk. Scientists may mistake AI-generated order for genuine biological structure simply because the output looks coherent and precise.

This is not merely a technical inconvenience. Some of the most medically important biological systems involve disorder, flexibility and transient interactions:

cancer signalling pathways
transcription factors
immune recognition
viral-host interactions
allosteric drug regulation

These are precisely the systems where an AI-driven acceleration of biology could have enormous long-term value for human health and longevity. But they are also systems where overconfidence can misdirect experiments, waste resources or encourage premature conclusions.

Novel chemistry exposes another weakness

AlphaFold 3 performs best when new problems resemble patterns already present in training data.

That is true for most machine learning systems, but biology makes the issue especially consequential because the highest-value discoveries often involve genuinely novel chemistry.

Several studies have warned that AlphaFold-style systems may partially rely on memorised statistical regularities rather than deeper physical understanding of molecular energetics. Accuracy can fall sharply when evaluating interactions unlike those seen during training. [Nexco]nexco.chThe Limitations of Protein Ligand Co folding with Alpha Fold 3, UnveiledNexcoThe Limitations of Protein-Ligand Co-folding with…Nov 17, 2025 — In brief, these analyses suggest that while AlphaFold 3 is defin… [Wikipedia This becomes especially important in drug discovery.]WikipediaAlpha FoldAlphaFoldAlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predicti…

A pharmaceutical researcher does not mainly care whether AlphaFold can reproduce familiar interactions already well represented in structural databases. They care whether it can predict new binding modes, unusual chemistries or previously unknown targets.

Benchmarking work on protein-ligand systems has shown that AlphaFold 3 still struggles with allosteric systems and certain binding-pocket configurations. [Nature]nature.comNatureChallenging AlphaFold in predicting proteins with large-…by BH Perkins-Jechow · 2025 · Cited by 6 — Here, we benchmarked AlphaFo…

In some cases, the model generates chemically plausible but incorrect docking arrangements that receive relatively strong confidence scores. Researchers have described these as “hallucinations” — outputs that look realistic but are not physically correct. [Drug Discovery Trends]drugdiscoverytrends.comDrug Discovery TrendsAlphaFold 3 offers even more accurate protein structure…8 May 2024 — One of the key challenges in computational s…Published: May 2024

The danger is not that scientists blindly trust every prediction. Structural biologists are generally cautious. The larger risk is subtler:

visually persuasive AI outputs can shape research priorities
confidence scores can narrow perceived uncertainty too early
institutions may overestimate how automated molecular discovery has become

That distinction matters for public narratives around AI-enabled scientific acceleration.

Confidence gaps illustration 2

Why this matters for the broader AI bloom argument

AlphaFold 3 is often presented as evidence that AI could dramatically accelerate medicine, biotechnology and eventually human flourishing itself. In many ways, it genuinely supports that case. Predicting molecular interactions faster and more cheaply could reduce years of experimental work, expand access to structural biology and help researchers explore diseases that previously lacked detailed molecular maps.

But the confidence-score problem reveals something important about the current stage of AI progress.

Scientific acceleration is not simply about generating more answers. It is about generating reliable knowledge under uncertainty.

AlphaFold 3 demonstrates that AI systems can compress enormous amounts of biological pattern recognition into highly useful predictions. At the same time, it shows that scientific reasoning still depends heavily on experimental validation, physical interpretation and careful treatment of uncertainty.

This tempers some of the more exaggerated narratives around “solving biology”. Even extraordinarily capable AI systems may remain uneven across domains where:

training data are sparse
physical dynamics matter
multiple states coexist
or novel chemistry exceeds historical examples

That does not weaken the broader possibility of AI-enabled abundance and scientific flourishing. In some ways it strengthens it by clarifying where future advances are still needed. The path toward radically accelerated science is likely to involve combinations of:

generative AI models
laboratory robotics
molecular simulation
high-throughput experimentation
and improved uncertainty estimation

rather than one model simply replacing experimental science.

Confidence gaps illustration 3

How researchers increasingly treat AlphaFold predictions

The emerging norm in structural biology is to treat AlphaFold 3 outputs as powerful hypotheses rather than final answers.

Researchers increasingly combine AI predictions with:

cryo-electron microscopy
X-ray crystallography
molecular dynamics simulations
mutational experiments
biochemical assays
and orthogonal computational methods

Confidence scores are useful guides, but they are now interpreted alongside broader biological context.

For example:

a high-confidence prediction in a rigid conserved protein family may deserve substantial trust
the same score in a flexible signalling complex or novel ligand system may deserve far more scepticism

Some groups are also developing refined confidence metrics that better account for flexible interfaces and partially ordered interactions. [arXiv]arxiv.orgarXivBenchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy chang…

This reflects a broader lesson likely to matter across future AI-enabled science. As AI systems become more capable, the central challenge may shift from producing plausible outputs to calibrating confidence correctly.

A system that occasionally says “I do not know” may ultimately accelerate science more safely and effectively than one that always produces a polished answer.

The deeper lesson: persuasive AI is not the same as solved science

AlphaFold 3 remains one of the most important AI systems ever built for biology. It has already changed how many scientists approach molecular structure prediction, and it may contribute to major advances in medicine over time. Nature [2blog.google]blog.googleAlpha Fold 3 predicts the structure and interactions of allAlphaFold 3 predicts the structure and interactions of all…May 8, 2024 — Our new AI model AlphaFold 3 can predict the structure and in…Published: May 8, 2024

But its confidence gaps are equally important to understand.

The system’s outputs often look more definitive than the underlying evidence justifies. High-confidence predictions can still conceal incorrect interfaces, missed conformational states or biologically unrealistic interactions. The risk grows in exactly the frontier domains where future medical breakthroughs are most needed.

That tension captures a larger truth about advanced AI and scientific progress. AI can massively expand humanity’s ability to search possibility space, generate hypotheses and compress scientific labour. Yet scientific understanding still depends on reality pushing back through experiment, replication and physical constraints.

For advocates of an AI-enabled human bloom, AlphaFold 3 is therefore both an encouraging signal and a cautionary one. It shows that AI can already amplify scientific capability dramatically. But it also shows that genuine knowledge, especially in complex living systems, remains harder than generating convincing predictions.

Endnotes

Source: ebi.ac.uk
Link: https://www.ebi.ac.uk/training/online/courses/alphafold/alphafold-3-and-alphafold-server/how-to-assess-the-quality-of-alphafold-3-predictions/
Source snippet
It uses a 0-100 scale, where higher values indicate higher confidence.Read more...
Source: academic.oup.com
Link: https://academic.oup.com/bib/article/26/6/bbaf616/8351050
Source snippet
OUP Academiccomprehensive benchmarking of the AlphaFold3 for predicting...by C Peng · 2025 · Cited by 7 — In this work, we benchmark Alp...
Source: nature.com
Link: https://www.nature.com/articles/s41467-025-67127-3
Source snippet
NatureBenchmarking all-atom biomolecular structure prediction...by S Xu · 2025 · Cited by 21 — We find that AlphaFold 3 leads overall, y...
Source: pubs.acs.org
Link: https://pubs.acs.org/doi/10.1021/acs.jcim.4c00976
Source snippet
ACS PublicationsEvaluation of AlphaFold 3's Protein–Protein Complexes for...by JJ Wee · 2024 · Cited by 79 — In this work, we evaluate A...
Source: arxiv.org
Link: https://arxiv.org/abs/2406.03979
Source snippet
arXivBenchmarking AlphaFold3's protein-protein complex accuracy and machine learning prediction reliability for binding free energy chang...
Source: ebi.ac.uk
Link: https://www.ebi.ac.uk/training/online/courses/alphafold/alphafold-3-and-alphafold-server/introducing-alphafold-3/what-alphafold-3-struggles-with/
Source snippet
What AlphaFold 3 struggles withA key limitation of protein structure prediction models is that they typically predict static structures a...
Source: nature.com
Link: https://www.nature.com/articles/s42004-025-01763-0
Source snippet
NatureChallenging AlphaFold in predicting proteins with large-...by BH Perkins-Jechow · 2025 · Cited by 6 — Here, we benchmarked AlphaFo...
Source: arxiv.org
Link: https://arxiv.org/abs/2410.14898
Source snippet
arXivProteins with alternative folds reveal blind spots in AlphaFold-based protein structure predictionOctober 18, 2024...

Published: October 18, 2024
Source: arxiv.org
Link: https://arxiv.org/html/2510.15939v2
Source snippet
(a) DisProt shows order in the residue, but AF3 predicts the residue with low confidence...Read more...
Source: nexco.ch
Title: The Limitations of Protein Ligand Co folding with Alpha Fold 3, Unveiled
Link: https://nexco.ch/blog/The-Limitations-of-Protein-Ligand-Co-folding-with-AlphaFold-3%2C-Unveiled
Source snippet
NexcoThe Limitations of Protein-Ligand Co-folding with...Nov 17, 2025 — In brief, these analyses suggest that while AlphaFold 3 is defin...
Source: Wikipedia
Title: Alpha Fold
Link: https://en.wikipedia.org/wiki/AlphaFold
Source snippet
AlphaFoldAlphaFold is an artificial intelligence (AI) program developed by DeepMind, a subsidiary of Alphabet, which performs predicti...
Source: nature.com
Link: https://www.nature.com/articles/s41467-025-63947-5
Source snippet
Nature, 630,493–500 (2024). Krishna, R. et al. Generalized...Read more...
Source: nature.com
Link: https://www.nature.com/articles/s41467-024-48837-6
Source snippet
Structure prediction of protein-ligand complexes from...by P Bryant · 2024 · Cited by 85 — Here we develop an AI system that can predict...
Source: arxiv.org
Link: https://arxiv.org/abs/2412.15970
Source snippet
arXivactifpTM: a refined confidence metric of AlphaFold2 predictions involving flexible regionsDecember 20, 2024...

Published: December 20, 2024
Source: nature.com
Link: https://www.nature.com/articles/s41586-024-07487-w
Source snippet
NatureAccurate structure prediction of biomolecular interactions...by J Abramson · 2024 · Cited by 14701 — Here we describe our AlphaFol...
Source: blog.google
Title: Alpha Fold 3 predicts the structure and interactions of all
Link: https://blog.google/innovation-and-ai/products/google-deepmind-isomorphic-alphafold-3-ai-model/
Source snippet
AlphaFold 3 predicts the structure and interactions of all...May 8, 2024 — Our new AI model AlphaFold 3 can predict the structure and in...

Published: May 8, 2024
Source: ebi.ac.uk
Link: https://www.ebi.ac.uk/training/online/courses/alphafold/an-introductory-guide-to-its-strengths-and-limitations/what-is-alphafold/
Source snippet
What is AlphaFold?AlphaFold2 is a multicomponent artificial intelligence (AI) system that uses machine learning to predict a protein's 3D...
Source: pubs.acs.org
Link: https://pubs.acs.org/doi/10.1021/acs.jcim.5c00906
Source snippet
Prediction of Alternate Frame Folding Systems with...Jul 27, 2025 — In this work, we use a family of green fluorescent proteins engineer...
Source: pubs.acs.org
Link: https://pubs.acs.org/doi/10.1021/acs.jcim.5c01084
Source snippet
is a Comprehensive Benchmarking Framework...12 Aug 2025 — Metrics for protein–peptide complex structure prediction.... Benchmarking Alp...
Source: nature.com
Link: https://www.nature.com/articles/s41586-021-03819-2
Source snippet
Highly accurate protein structure prediction with AlphaFoldby J Jumper · 2021 · Cited by 49425 — The AlphaFold network directly predicts...
Source: nature.com
Link: https://www.nature.com/articles/s41586-024-07487-w_reference.pdf
Source snippet
257. We note model limitations of AlphaFold 3 with respect to stereochemistry, hallucinations. 258 dynamics, and accuracy...Read more...
Source: arxiv.org
Link: https://arxiv.org/html/2508.18446v1
Source snippet
AlphaFold 3 as a Differentiable Framework for Structural...25 Aug 2025 — Indeed, even AlphaFold's impressive performance falters for pro...
Source: academic.oup.com
Link: https://academic.oup.com/bib/article/26/4/bbaf324/8190210
Source snippet
A key challenge in protein engineering is understanding how mutations affect protein fitness and stability...
Source: academic.oup.com
Link: https://academic.oup.com/pcm/article/8/3/pbaf015/8180385
Source snippet
3: an unprecedent opportunity for fundamental...by Z Fang · 2025 · Cited by 36 — This limitation restricts the application of AF3 in the...
Source: drugdiscoverytrends.com
Link: https://www.drugdiscoverytrends.com/meet-alphafold-3-which-can-accurately-model-more-than-99-of-molecular-types-in-the-protein-data-bank/
Source snippet
Drug Discovery TrendsAlphaFold 3 offers even more accurate protein structure...8 May 2024 — One of the key challenges in computational s...

Published: May 2024
Source: deepmind.google
Link: https://deepmind.google/science/alphafold/
Source snippet
Google DeepMindAlphaFold — Google DeepMindAlphaFold has revealed millions of intricate 3D protein structures, and is helping scientists u...
Source: alphafold.ebi.ac.uk
Link: https://alphafold.ebi.ac.uk/
Source snippet
Protein Structure DatabaseAlphaFold is an AI system developed by Google DeepMind that predicts a protein's 3D structure from its amino ac...
Source: alphafold.ebi.ac.uk
Title: ebi.ac.uk FA Qs
Link: https://alphafold.ebi.ac.uk/faq
Source snippet
AlphaFold Protein Structure DatabaseRegions with pLDDT between 50 and 70 are low confidence and should be treated with caution.... For p...
Source: deepmind.google
Title: alphafold five years of impact
Link: https://deepmind.google/blog/alphafold-five-years-of-impact/
Source snippet
AlphaFold: Five Years of ImpactNov 25, 2025 — Explore five years of AlphaFold's impact on biology. Learn how this Nobel Prize-winning AI...

Additional References

Source: researchgate.net
Link: https://www.researchgate.net/publication/398103542_A_comprehensive_benchmarking_of_the_AlphaFold3_for_predicting_biomacromolecules_and_their_interactions
Source snippet
A comprehensive benchmarking of the AlphaFold3 for...5 Dec 2025 — In this work, we benchmark AlphaFold3's performance across nine datase...
Source: medium.com
Link: https://medium.com/%40cognidownunder/alphafold-changed-biology-forever-when-it-solved-protein-folding-78bb8768483a
Source snippet
AlphaFold 3 Predicts Everything Now, Not Just Proteins...AlphaFold changed biology forever when it solved protein folding. Now AlphaFold...
Source: alphafoldserver.com
Link: https://alphafoldserver.com/
Source snippet
AlphaFold ServerAlphaFold Server is a web-service that can generate highly accurate biomolecular structure predictions containing protein...
Source: creative-biostructure.com
Link: https://www.creative-biostructure.com/alphafold3-accurate-molecular-interaction-prediction.htm?srsltid=AfmBOoqAlFPWyvCVKU6_vG1gPbTJV9t-x6BFgA9_M10jDgNaJB5t4aiZ
Source snippet
AlphaFold3: Accurate Structure Prediction of Molecular...A fundamental limitation of AF3 is its focus on predicting static structures, w...
Source: x.com
Link: https://x.com/BiologyAIDaily/status/1941486774834037247
Source snippet
An Evaluation of Biomolecular Energetics Learned by...Importantly, AlphaFold's confidence scores (pLDDT) were high even for residues wit...
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Assessing-scoring-metrics-for-AlphaFold2-and-Genz-Nair/8e81365097d4eea07a8a6fe5d3df35615271cc7c
Source snippet
Assessing scoring metrics for AlphaFold2 and AlphaFold3...The new C2Qscore developed in this study improves the reliability of AlphaFold...
Source: github.com
Link: https://github.com/google-deepmind/alphafold
Source snippet
Open source code for AlphaFold 2.This package provides an implementation of the inference pipeline of AlphaFold v2. For simplicity, we re...
Source: frontiersin.org
Link: https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2026.1739303/full
Source snippet
The transformative impact of AI-enabled AlphaFold 3by C Chakraborty — The model achieved approximately 76% accuracy in predicting protein...
Source: medium.com
Link: https://medium.com/data-science/sparks-of-chemical-intuition-and-gross-limitations-in-alphafold-3-8487ba4dfb53
Source snippet
“Sparks of Chemical Intuition”—and Gross Limitations!“Sparks of Chemical Intuition”—and Gross Limitations!—in AlphaFold 3. Observations a...
Source: 3decision.discngine.com
Link: https://3decision.discngine.com/blog/2024/8/8/evaluating-protein-protein-interactions-in-af3-predicted-complexes-a-pd-1-case-study
Source snippet
protein-protein interactions in AF3 predicted...by E Martino — The latest release of AlphaFold (AF3) has addressed some limitations of t...