Review bottleneck

Introduction

AI coding tools can generate working software far faster than most humans can type it. But in mature software systems, writing code is often not the real bottleneck. The harder task is deciding whether new code is safe, maintainable, secure, compatible with the rest of the system, and likely to behave correctly months or years later.

Review bottleneck illustration 1 That distinction matters for the broader idea of AI abundance. If advanced AI systems eventually make technical labour dramatically cheaper, software engineering should be one of the earliest places where this becomes visible. Yet current evidence suggests a more complicated pattern: AI can greatly increase the volume of code produced, while shifting scarce human effort into review, debugging, verification, and long-term maintenance. In practice, teams may move from a world where programmers spend most of their time writing code to one where they spend increasing amounts of time inspecting machine-generated output.

The result is a “review bottleneck”: a situation where code generation scales faster than trustworthy verification. That bottleneck helps explain why some expert developers report slower work despite impressive-looking AI assistance. [metr.org]metr.org2025 07 10 early 2025 ai experienced os dev studyWhen developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against… [arXiv]arxiv.orgarXivMeasuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity…

Why verification becomes the scarce resource

Software engineering contains at least two different activities that are easy to blur together:

Producing code text [linkedin.com]linkedin.comCode Review Bottleneck in Software EngineeringCode review is the new bottleneck in software engineering. A year ago, engineers were the s…
Judging whether the code should exist inside a real system

Large language models are increasingly good at the first task. The second remains difficult because software systems contain hidden assumptions, historical compromises, undocumented dependencies, and operational risks that rarely appear in the prompt window.

In a mature codebase, even a small change can have consequences across deployment systems, data models, security permissions, performance behaviour, or compliance requirements. Human reviewers therefore spend much of their time asking questions that are only indirectly about syntax:

Does this match the architecture of the system?
Will future engineers understand it?
Does it create maintenance debt?
Does it subtly duplicate existing logic?
Could it fail under unusual conditions?
Does it introduce a security or reliability risk?
Is the generated solution solving the right problem at all?

AI systems often produce plausible-looking answers without possessing the operational understanding needed to reliably answer those questions. This creates an asymmetry: generating candidate code is cheap, but validating it remains expensive.

The METR study on experienced open-source developers illustrated this clearly. Developers using frontier AI tools completed tasks more slowly overall, despite expecting substantial speed gains. A significant portion of the additional time went into prompting, reviewing, correcting, and cleaning up AI output rather than writing original code. [IT Pro]itpro.comIT Pro Think AI coding tools are speeding up work?Think again - they're actually slowing developers downA recent study by Model Evaluation & Threat Research (METR) challenges common assum…

This is one reason the productivity debate around AI coding tools often becomes confused. Typing fewer lines of code does not necessarily mean less engineering work. In some environments, the work simply migrates upstream into supervision and downstream into maintenance.

How fluent code creates post-editing work

AI-generated code creates a distinctive kind of labour: post-editing. The generated output frequently looks coherent enough to pass an initial glance, but still requires extensive human checking.

That changes the cognitive structure of programming.

Reading becomes harder than writing

Experienced engineers often report that understanding unfamiliar code is slower than producing it themselves. Human-written code usually reflects the author’s mental model and local conventions. AI-generated code may instead optimise for surface plausibility.

As a result, reviewers must reconstruct intent after the fact.

This becomes especially costly when the AI produces:

Correct-looking but subtly wrong logic
Redundant abstractions
Over-engineered structures
Security vulnerabilities
Inefficient database or API calls
Inconsistent naming and style
Hidden edge-case failures

Because the code appears fluent, reviewers cannot safely skim it. They may need to inspect it line by line.

Research into developer behaviour around LLM-generated code found that programmers validating AI output showed increased cognitive workload, frequent context switching, and repeated cycles of deletion and rewriting. [arXiv]arxiv.orgarXivMeasuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity…

The burden is amplified by the speed of generation. A developer can now produce several large pull requests in the time previously required for one carefully written change. Review capacity often does not scale at the same rate.

The “cheap code” problem

Historically, writing code carried a natural cost. Engineers avoided unnecessary complexity partly because producing software required significant time and attention.

AI changes that incentive structure.

When generating another abstraction, helper function, or architectural layer becomes almost free, systems can accumulate excess code more rapidly. But every additional line still imposes future maintenance costs:

More surface area for bugs
More dependencies to understand
More interactions to test
More upgrade paths to maintain
More review overhead for future changes [docs.github.com]docs.github.comusing copilot code reviewGitHub Copilot code reviewGitHub Copilot can review your code and provide feedback. Where possible, Copilot's feedback includes suggested…

This resembles a broader economic pattern seen in other domains of abundance. When production becomes extremely cheap, filtering and quality control often become the scarce resource.

Software engineering may therefore become less constrained by code production and more constrained by institutional trust: determining which generated outputs deserve integration into critical systems.

Fluent mistakes are expensive

Traditional programming errors are often obvious because the code simply fails to compile or crashes immediately.

AI-generated mistakes can be more dangerous because they are persuasive. GitHub’s own documentation warns that Copilot code review can generate “hallucinations”, including comments based on misunderstandings of the codebase. [GitHub Docs]docs.github.comGitHub DocsResponsible use of GitHub Copilot code reviewCopilot code review has a risk of "hallucination" - that is, it may highlight pro…

This creates a paradoxical workflow:

AI produces convincing code quickly
Humans become less certain which parts deserve trust
Verification requirements increase
Review time expands

The danger is not only incorrect code. It is misplaced confidence.

The METR findings showed a striking mismatch between perception and reality: developers believed AI had sped them up even when measured completion times showed the opposite. metr.org [Reuters That gap matters because organisations may overestimate productivity gains while undercounting hidden review labour.]reuters.comai slows down some experienced software developers study finds 2025 07 10ReutersAI slows down some experienced software developers…Jul 10, 2025 — But the study found that using AI did the opposite: it increa…

Review bottleneck illustration 2

The invisible work created by AI assistance

Many organisations still measure software productivity using metrics tied to visible output:

Pull requests merged
Tickets closed
Lines of code written [github.blog]github.blogThe Git Hub Blog Does Git Hub Copilot improve code quality?Here's what…Nov 18, 2024 — Findings in our latest study show that the quality of code written with GitHub Copilot is significantly mor…
Features shipped

AI tools can improve these numbers while simultaneously increasing hidden maintenance work.

A recent industry report found most developers now spend more time reviewing and fixing AI-assisted code, with many describing this effort as “invisible work” that is poorly captured by existing productivity systems. [IT Pro]itpro.comIT Pro Think AI coding tools are speeding up work?Think again - they're actually slowing developers downA recent study by Model Evaluation & Threat Research (METR) challenges common assum…

This hidden labour includes:

Verifying generated code [arxiv.org]arxiv.orgarXivA Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE ActionsMay 25, 2024…Published: May 25, 2024
Re-running tests
Cleaning architectural inconsistencies
Rewriting brittle implementations
Checking security implications
Simplifying overcomplicated outputs
Explaining AI-generated code to teammates
Maintaining code the original author barely understands

In highly regulated or safety-critical environments, verification costs can become even larger. Medical, financial, infrastructure, and aerospace systems often require traceability, auditing, and compliance review that AI-generated code does not automatically satisfy.

As code generation accelerates, senior engineers may spend less time building systems directly and more time acting as validators, editors, and risk managers.

Open source maintainers are already seeing the overload

The review bottleneck is not limited to corporate software teams. Open-source maintainers increasingly report floods of AI-assisted submissions, bug reports, and patches that require human triage.

In 2026, Linus Torvalds criticised a surge of AI-generated Linux vulnerability reports that overwhelmed maintainers with duplicate or low-value findings. He described the private security mailing list as becoming “almost entirely unmanageable”. [Tom's Hardware]tomshardware.comThese tools often identify the same bugs, leading to multiple redundant reports, which overwhelm maintainers who must triage and redirect…

The issue was not that the reports were always false. Many identified real bugs. The problem was that maintainers still had to:

Confirm reproducibility
Assess severity
Check whether issues were already known
Determine whether fixes were safe
Integrate patches into the wider system

AI had scaled discovery faster than human verification capacity.

This pattern may become increasingly common across software ecosystems. If AI systems can generate thousands of candidate fixes, pull requests, or vulnerability findings, then expert attention becomes the limiting resource.

That is a preview of a broader AI-era governance problem: abundance of outputs does not eliminate the need for trusted judgement.

Review bottleneck illustration 3

What teams can measure before trusting speed gains

The review bottleneck does not mean AI coding tools are useless. Many developers genuinely benefit from them, especially for prototyping, repetitive work, onboarding, or unfamiliar frameworks. [The GitHub Blog]github.blogThe GitHub BlogResearch: quantifying GitHub Copilot's impact on…Sep 7, 2022 — In our research, we saw that GitHub Copilot supports fas…

But organisations that evaluate AI solely through code generation speed can misread what is happening.

More useful measures often focus on verification and maintenance costs rather than raw output volume.

Review-to-write ratio

One useful indicator is how much time teams spend reviewing compared with authoring code.

If AI dramatically increases review time per change, apparent productivity gains may be illusory.

Acceptance rate of generated code

Another important measure is how much generated code survives intact.

Low acceptance rates can indicate that teams are spending large amounts of effort cleaning, rewriting, or discarding AI output. The METR study found that less than half of generated suggestions were accepted directly. [Business Insider]businessinsider.comConducted with 16 seasoned developers familiar with open-source projects they had worked on for years, the study randomly assigned partic…

Rework and rollback frequency

[Teams can also track:]itpro.comThe widespread integration of AI is causing developers to spend significantly more time on manual code reviews and bug fixes, with 81% re…

Bug-fix rates
Rollbacks
Hotfix frequency
Security incidents
Architectural refactors after AI-heavy periods

Fast generation followed by expensive repair is not genuine productivity.

Long-term maintainability

The hardest metric is often delayed maintenance burden.

Code that appears efficient today may create years of future complexity if it is difficult to understand, poorly integrated, or overly dependent on generated abstractions.

This is one reason experienced developers sometimes remain sceptical even when demonstrations look impressive. The real cost of software often appears long after the initial code generation step.

Why this matters beyond software engineering

Software development is unusually important because it is one of the first knowledge industries where AI systems can generate large amounts of economically useful output directly.

That makes coding a test case for a larger question inside the AI bloom debate: if intelligence becomes abundant, what remains scarce?

Current evidence suggests that judgement, verification, trust, and institutional understanding may remain bottlenecks even as raw generation accelerates.

That does not invalidate the possibility of large-scale AI-driven abundance. Future systems may become much better at testing, formal verification, architectural reasoning, and long-horizon maintenance. Automated verification tools may eventually reduce today’s review burden substantially.

But the present transition reveals an important lesson. Human flourishing depends not only on producing more outputs, but on integrating them safely into complex social and technical systems.

In software engineering, AI has already shown that generating answers is easier than knowing which answers deserve confidence. That distinction may matter far beyond code.

Endnotes

Source: metr.org
Title: 2025 07 10 early 2025 ai experienced os dev study
Link: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
Source snippet
When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against...
Source: arxiv.org
Link: https://arxiv.org/abs/2507.09089
Source snippet
arXivMeasuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity...
Source: arxiv.org
Link: https://arxiv.org/abs/2405.16081
Source snippet
arXivA Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE ActionsMay 25, 2024...

Published: May 25, 2024
Source: docs.github.com
Link: https://docs.github.com/en/copilot/responsible-use/code-review
Source snippet
GitHub DocsResponsible use of GitHub Copilot code reviewCopilot code review has a risk of "hallucination" - that is, it may highlight pro...
Source: reuters.com
Title: ai slows down some experienced software developers study finds 2025 07 10
Link: https://www.reuters.com/business/ai-slows-down-some-experienced-software-developers-study-finds-2025-07-10/
Source snippet
ReutersAI slows down some experienced software developers...Jul 10, 2025 — But the study found that using AI did the opposite: it increa...
Source: time.com
Title: In the Loop: AI Promised Faster Coding
Link: https://time.com/7302351/ai-software-coding-study/
Source snippet
This Study DisagreesA recent METR study challenges the assumption that AI accelerates software development. In tests with 16 experienced...
Source: github.blog
Link: https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
Source snippet
The GitHub BlogResearch: quantifying GitHub Copilot's impact on...Sep 7, 2022 — In our research, we saw that GitHub Copilot supports fas...
Source: github.blog
Title: The Git Hub Blog Does Git Hub Copilot improve code quality?
Link: https://github.blog/news-insights/research/does-github-copilot-improve-code-quality-heres-what-the-data-says/
Source snippet
Here's what...Nov 18, 2024 — Findings in our latest study show that the quality of code written with GitHub Copilot is significantly mor...
Source: github.com
Title: Git Hub · Change is constant
Link: https://github.com/
Source snippet
GitHub keeps you ahead. · GitHubFrom your first line of code to final deployment, GitHub provides AI and [automation]({{ 'ai-bloom-abun/ai-bloom-abun-98d3a6-ai-medicine-l-7c4177-clinical-seco-8e7b35-automation-bi-6899ab/' | rel...
Source: docs.github.com
Title: using copilot code review
Link: https://docs.github.com/copilot/using-github-copilot/code-review/using-copilot-code-review
Source snippet
GitHub Copilot code reviewGitHub Copilot can review your code and provide feedback. Where possible, Copilot's feedback includes suggested...
Source: github.com
Title: copilot code quality
Link: https://github.com/resources/insights/copilot-code-quality
Source snippet
Quality is key: GitHub Copilot and code quality8 May 2025 — THe study found that code created with the help of Copilot was more likely to...

Published: May 2025
Source: github.com
Title: Copilot hallucinating / using “own” memory instead actual
Link: https://github.com/orgs/community/discussions/160959
Source snippet
May 29, 2025 — You've highlighted an important issue with Copilot providing incorrect information despite having access to accurate data...

Published: May 29, 2025
Source: github.blog
Link: https://github.blog/security/how-to-scan-for-vulnerabilities-with-github-security-labs-open-source-ai-powered-framework/
Source snippet
How to scan for vulnerabilities with GitHub Security Lab's...Mar 6, 2026 — GitHub Security Lab Taskflow Agent is very effective at findi...
Source: metr.org
Link: https://metr.org/blog/2026-02-24-uplift-update/
Source snippet
We are Changing our Developer Productivity Experiment...Feb 24, 2026 — Our early 2025 study found the use of AI causes tasks to take 19%...
Source: metr.org
Link: https://metr.org/
Source snippet
METRWe found that when developers used AI tools in early 2025, they took 19% longer than without—AI made them slower. Read more. MALT. A...
Source: arxiv.org
Link: https://arxiv.org/html/2204.04741v5
Source snippet
The aim of this study is to determine if...Read more...
Source: arxiv.org
Title: How Readable is Model-generated Code?
Link: https://arxiv.org/pdf/2208.14613
Source snippet
NA Madi · 2022 · Cited by 94 — Objective: In this paper, we focus on GitHub Copilot to address the issues of readability...
Source: augmentcode.com
Title: why ai coding tools make experienced developers 19 slower and how to fix it
Link: https://www.augmentcode.com/guides/why-ai-coding-tools-make-experienced-developers-19-slower-and-how-to-fix-it
Source snippet
Why AI Coding Tools Make Experienced Developers 19%...Oct 3, 2025 — The METR study's 19% slowdown reflects the cognitive overhead of man...
Source: businessinsider.com
Link: https://www.businessinsider.com/ai-coding-tools-may-decrease-productivity-experienced-software-engineers-study-2025-7
Source snippet
Conducted with 16 seasoned developers familiar with open-source projects they had worked on for years, the study randomly assigned partic...
Source: itpro.com
Title: IT Pro Think AI coding tools are speeding up work?
Link: https://www.itpro.com/software/development/think-ai-coding-tools-are-speeding-up-work-think-again-theyre-actually-slowing-developers-down
Source snippet
Think again - they're actually slowing developers downA recent study by Model Evaluation & Threat Research (METR) challenges common assum...
Source: itpro.com
Link: https://www.itpro.com/software/development/ai-might-help-speed-up-software-development-but-81-percent-of-devs-now-spend-more-time-reviewing-code-and-its-creating-an-invisible-work-trend-thats-pushing-teams-to-the-limit
Source snippet
The widespread integration of AI is causing developers to spend significantly more time on manual code reviews and bug fixes, with 81% re...
Source: tomshardware.com
Link: https://www.tomshardware.com/software/linux/linus-torvalds-says-ai-bug-reports-have-made-the-linux-security-mailing-list-almost-entirely-unmanageable
Source snippet
These tools often identify the same bugs, leading to multiple redundant reports, which overwhelm maintainers who must triage and redirect...
Source: Wikipedia
Link: https://en.wikipedia.org/wiki/METR
Source snippet
METRModel Evaluation and Threat Research (METR) (MEE-tər), is a nonprofit research institute, based in Berkeley, California, that eval...

Additional References

Source: letsdatascience.com
Link: https://letsdatascience.com/blog/developers-thought-ai-made-them-faster-the-data-said-otherwise
Source snippet
AI Coding Tools Made Developers 19% Slower: METR StudyA METR randomized controlled trial found AI coding tools made experienced developer...
Source: reddit.com
Link: https://www.reddit.com/r/ArtificialInteligence/comments/1mqxamb/ai_makes_experienced_developers_19_slower_based/
Source snippet
AI makes experienced developers 19% slower based on...The results showed that developers took 19% longer to complete tasks when using AI...
Source: instagram.com
Link: https://www.instagram.com/p/DXNOnUbigua/
Source snippet
AI tools generate code far faster than any human reviewer...AI tools generate code far faster than any human reviewer can meaningfully p...
Source: linkedin.com
Link: https://www.linkedin.com/posts/paulius-kuzmickas-1b35ba12b_code-review-is-the-new-bottleneck-in-software-activity-7452242223208898560-xHM5
Source snippet
Code Review Bottleneck in Software EngineeringCode review is the new bottleneck in software engineering. A year ago, engineers were the s...
Source: reddit.com
Link: https://www.reddit.com/r/coding/comments/1gw1ws2/does_github_copilot_improve_code_quality_heres/
Source snippet
Does GitHub Copilot Improve Code Quality? Here's How...One point I observed is the first graph shown that lists: Using Copilot 37.8% pas...
Source: reddit.com
Link: https://www.reddit.com/r/slatestarcodex/comments/1lwrb09/metr_finds_that_experienced_opensource_developers/
Source snippet
METR finds that experienced open-source developers...METR finds that experienced open-source developers work 19% slower when using Early...
Source: reddit.com
Link: https://www.reddit.com/r/programming/comments/1ac7cb2/new_github_copilot_research_finds_downward/
Source snippet
New GitHub Copilot Research Finds 'Downward Pressure...We have both noticed and hypothesise that the juniors are using ai assistant stuf...
Source: techradar.com
Link: https://www.techradar.com/pro/using-ai-might-actually-slow-down-experienced-devs
Source snippet
Conducted on 16 seasoned developers working on 246 tasks across familiar open-source projects, the study revealed that while developers i...
Source: devclass.com
Link: https://www.devclass.com/development/2024/05/14/github-research-reports-high-copilot-satisfaction-from-enterprise-devs-but-others-doubt-productivity-gains/1623421
Source snippet
GitHub research reports high Copilot satisfaction from...14 May 2024 — Developers in the trial accepted around 30 percent of Copilot sug...

Published: May 2024
Source: linkedin.com
Link: https://www.linkedin.com/posts/rhunterharris_ai-coding-tools-made-experienced-developers-activity-7371559585083535361-qKp_
Source snippet
AI coding tools slow experienced developers, not speed...Back in July 2025, a study from METR made some noise: “Early-2025 AI coding too...

Published: July 2025

Amazon book picks

The Review Bottleneck

Introduction

Why verification becomes the scarce resource

How fluent code creates post-editing work

Reading becomes harder than writing

The “cheap code” problem

Fluent mistakes are expensive

The invisible work created by AI assistance

Open source maintainers are already seeing the overload

What teams can measure before trusting speed gains

Review-to-write ratio

Acceptance rate of generated code

Rework and rollback frequency

Long-term maintainability

Why this matters beyond software engineering

Endnotes

Additional References

Further Reading

Software Quality Engineering

Metrics and Models in Software Quality Engineering

Software quality assurance

Software quality

Follow this branch

Parent topic

Related pages 2