Within Coding Tradeoffs

The Review Bottleneck

When AI produces code faster than people can safely verify it, productivity gains can turn into review, debugging, and maintenance bottlenecks.

On this page

  • Why verification becomes the scarce resource
  • How fluent code creates post editing work
  • What teams can measure before trusting speed gains
Preview for The Review Bottleneck

Introduction

AI coding tools can generate working software far faster than most humans can type it. But in mature software systems, writing code is often not the real bottleneck. The harder task is deciding whether new code is safe, maintainable, secure, compatible with the rest of the system, and likely to behave correctly months or years later.

Review bottleneck illustration 1 That distinction matters for the broader idea of AI abundance. If advanced AI systems eventually make technical labour dramatically cheaper, software engineering should be one of the earliest places where this becomes visible. Yet current evidence suggests a more complicated pattern: AI can greatly increase the volume of code produced, while shifting scarce human effort into review, debugging, verification, and long-term maintenance. In practice, teams may move from a world where programmers spend most of their time writing code to one where they spend increasing amounts of time inspecting machine-generated output.

The result is a “review bottleneck”: a situation where code generation scales faster than trustworthy verification. That bottleneck helps explain why some expert developers report slower work despite impressive-looking AI assistance. [metr.org]metr.org2025 07 10 early 2025 ai experienced os dev studyWhen developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against… [arXiv]arxiv.orgarXivMeasuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity…

Why verification becomes the scarce resource

Software engineering contains at least two different activities that are easy to blur together:

  • Producing code text [linkedin.com]linkedin.comCode Review Bottleneck in Software EngineeringCode review is the new bottleneck in software engineering. A year ago, engineers were the s…
  • Judging whether the code should exist inside a real system

Large language models are increasingly good at the first task. The second remains difficult because software systems contain hidden assumptions, historical compromises, undocumented dependencies, and operational risks that rarely appear in the prompt window.

In a mature codebase, even a small change can have consequences across deployment systems, data models, security permissions, performance behaviour, or compliance requirements. Human reviewers therefore spend much of their time asking questions that are only indirectly about syntax:

  • Does this match the architecture of the system?
  • Will future engineers understand it?
  • Does it create maintenance debt?
  • Does it subtly duplicate existing logic?
  • Could it fail under unusual conditions?
  • Does it introduce a security or reliability risk?
  • Is the generated solution solving the right problem at all?

AI systems often produce plausible-looking answers without possessing the operational understanding needed to reliably answer those questions. This creates an asymmetry: generating candidate code is cheap, but validating it remains expensive.

The METR study on experienced open-source developers illustrated this clearly. Developers using frontier AI tools completed tasks more slowly overall, despite expecting substantial speed gains. A significant portion of the additional time went into prompting, reviewing, correcting, and cleaning up AI output rather than writing original code. [IT Pro]itpro.comIT Pro Think AI coding tools are speeding up work?Think again - they're actually slowing developers downA recent study by Model Evaluation & Threat Research (METR) challenges common assum…

This is one reason the productivity debate around AI coding tools often becomes confused. Typing fewer lines of code does not necessarily mean less engineering work. In some environments, the work simply migrates upstream into supervision and downstream into maintenance.

How fluent code creates post-editing work

AI-generated code creates a distinctive kind of labour: post-editing. The generated output frequently looks coherent enough to pass an initial glance, but still requires extensive human checking.

That changes the cognitive structure of programming.

Reading becomes harder than writing

Experienced engineers often report that understanding unfamiliar code is slower than producing it themselves. Human-written code usually reflects the author’s mental model and local conventions. AI-generated code may instead optimise for surface plausibility.

As a result, reviewers must reconstruct intent after the fact.

This becomes especially costly when the AI produces:

  • Correct-looking but subtly wrong logic
  • Redundant abstractions
  • Over-engineered structures
  • Security vulnerabilities
  • Inefficient database or API calls
  • Inconsistent naming and style
  • Hidden edge-case failures

Because the code appears fluent, reviewers cannot safely skim it. They may need to inspect it line by line.

Research into developer behaviour around LLM-generated code found that programmers validating AI output showed increased cognitive workload, frequent context switching, and repeated cycles of deletion and rewriting. [arXiv]arxiv.orgarXivMeasuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity…

The burden is amplified by the speed of generation. A developer can now produce several large pull requests in the time previously required for one carefully written change. Review capacity often does not scale at the same rate.

The “cheap code” problem

Historically, writing code carried a natural cost. Engineers avoided unnecessary complexity partly because producing software required significant time and attention.

AI changes that incentive structure.

When generating another abstraction, helper function, or architectural layer becomes almost free, systems can accumulate excess code more rapidly. But every additional line still imposes future maintenance costs:

  • More surface area for bugs
  • More dependencies to understand
  • More interactions to test
  • More upgrade paths to maintain
  • More review overhead for future changes [docs.github.com]docs.github.comusing copilot code reviewGitHub Copilot code reviewGitHub Copilot can review your code and provide feedback. Where possible, Copilot's feedback includes suggested…

This resembles a broader economic pattern seen in other domains of abundance. When production becomes extremely cheap, filtering and quality control often become the scarce resource.

Software engineering may therefore become less constrained by code production and more constrained by institutional trust: determining which generated outputs deserve integration into critical systems.

Fluent mistakes are expensive

Traditional programming errors are often obvious because the code simply fails to compile or crashes immediately.

AI-generated mistakes can be more dangerous because they are persuasive. GitHub’s own documentation warns that Copilot code review can generate “hallucinations”, including comments based on misunderstandings of the codebase. [GitHub Docs]docs.github.comGitHub DocsResponsible use of GitHub Copilot code reviewCopilot code review has a risk of "hallucination" - that is, it may highlight pro…

This creates a paradoxical workflow:

  1. AI produces convincing code quickly
  2. Humans become less certain which parts deserve trust
  3. Verification requirements increase
  4. Review time expands

The danger is not only incorrect code. It is misplaced confidence.

The METR findings showed a striking mismatch between perception and reality: developers believed AI had sped them up even when measured completion times showed the opposite. metr.org [Reuters That gap matters because organisations may overestimate productivity gains while undercounting hidden review labour.]reuters.comai slows down some experienced software developers study finds 2025 07 10ReutersAI slows down some experienced software developers…Jul 10, 2025 — But the study found that using AI did the opposite: it increa…

Review bottleneck illustration 2

The invisible work created by AI assistance

Many organisations still measure software productivity using metrics tied to visible output:

  • Pull requests merged
  • Tickets closed
  • Lines of code written [github.blog]github.blogThe Git Hub Blog Does Git Hub Copilot improve code quality?Here's what…Nov 18, 2024 — Findings in our latest study show that the quality of code written with GitHub Copilot is significantly mor…
  • Features shipped

AI tools can improve these numbers while simultaneously increasing hidden maintenance work.

A recent industry report found most developers now spend more time reviewing and fixing AI-assisted code, with many describing this effort as “invisible work” that is poorly captured by existing productivity systems. [IT Pro]itpro.comIT Pro Think AI coding tools are speeding up work?Think again - they're actually slowing developers downA recent study by Model Evaluation & Threat Research (METR) challenges common assum…

This hidden labour includes:

  • Verifying generated code [arxiv.org]arxiv.orgarXivA Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE ActionsMay 25, 2024…Published: May 25, 2024
  • Re-running tests
  • Cleaning architectural inconsistencies
  • Rewriting brittle implementations
  • Checking security implications
  • Simplifying overcomplicated outputs
  • Explaining AI-generated code to teammates
  • Maintaining code the original author barely understands

In highly regulated or safety-critical environments, verification costs can become even larger. Medical, financial, infrastructure, and aerospace systems often require traceability, auditing, and compliance review that AI-generated code does not automatically satisfy.

As code generation accelerates, senior engineers may spend less time building systems directly and more time acting as validators, editors, and risk managers.

Open source maintainers are already seeing the overload

The review bottleneck is not limited to corporate software teams. Open-source maintainers increasingly report floods of AI-assisted submissions, bug reports, and patches that require human triage.

In 2026, Linus Torvalds criticised a surge of AI-generated Linux vulnerability reports that overwhelmed maintainers with duplicate or low-value findings. He described the private security mailing list as becoming “almost entirely unmanageable”. [Tom's Hardware]tomshardware.comThese tools often identify the same bugs, leading to multiple redundant reports, which overwhelm maintainers who must triage and redirect…

The issue was not that the reports were always false. Many identified real bugs. The problem was that maintainers still had to:

  • Confirm reproducibility
  • Assess severity
  • Check whether issues were already known
  • Determine whether fixes were safe
  • Integrate patches into the wider system

AI had scaled discovery faster than human verification capacity.

This pattern may become increasingly common across software ecosystems. If AI systems can generate thousands of candidate fixes, pull requests, or vulnerability findings, then expert attention becomes the limiting resource.

That is a preview of a broader AI-era governance problem: abundance of outputs does not eliminate the need for trusted judgement.

Review bottleneck illustration 3

What teams can measure before trusting speed gains

The review bottleneck does not mean AI coding tools are useless. Many developers genuinely benefit from them, especially for prototyping, repetitive work, onboarding, or unfamiliar frameworks. [The GitHub Blog]github.blogThe GitHub BlogResearch: quantifying GitHub Copilot's impact on…Sep 7, 2022 — In our research, we saw that GitHub Copilot supports fas…

But organisations that evaluate AI solely through code generation speed can misread what is happening.

More useful measures often focus on verification and maintenance costs rather than raw output volume.

Review-to-write ratio

One useful indicator is how much time teams spend reviewing compared with authoring code.

If AI dramatically increases review time per change, apparent productivity gains may be illusory.

Acceptance rate of generated code

Another important measure is how much generated code survives intact.

Low acceptance rates can indicate that teams are spending large amounts of effort cleaning, rewriting, or discarding AI output. The METR study found that less than half of generated suggestions were accepted directly. [Business Insider]businessinsider.comConducted with 16 seasoned developers familiar with open-source projects they had worked on for years, the study randomly assigned partic…

Rework and rollback frequency

[Teams can also track:]itpro.comThe widespread integration of AI is causing developers to spend significantly more time on manual code reviews and bug fixes, with 81% re…

  • Bug-fix rates
  • Rollbacks
  • Hotfix frequency
  • Security incidents
  • Architectural refactors after AI-heavy periods

Fast generation followed by expensive repair is not genuine productivity.

Long-term maintainability

The hardest metric is often delayed maintenance burden.

Code that appears efficient today may create years of future complexity if it is difficult to understand, poorly integrated, or overly dependent on generated abstractions.

This is one reason experienced developers sometimes remain sceptical even when demonstrations look impressive. The real cost of software often appears long after the initial code generation step.

Why this matters beyond software engineering

Software development is unusually important because it is one of the first knowledge industries where AI systems can generate large amounts of economically useful output directly.

That makes coding a test case for a larger question inside the AI bloom debate: if intelligence becomes abundant, what remains scarce?

Current evidence suggests that judgement, verification, trust, and institutional understanding may remain bottlenecks even as raw generation accelerates.

That does not invalidate the possibility of large-scale AI-driven abundance. Future systems may become much better at testing, formal verification, architectural reasoning, and long-horizon maintenance. Automated verification tools may eventually reduce today’s review burden substantially.

But the present transition reveals an important lesson. Human flourishing depends not only on producing more outputs, but on integrating them safely into complex social and technical systems.

In software engineering, AI has already shown that generating answers is easier than knowing which answers deserve confidence. That distinction may matter far beyond code.

Endnotes

  1. Source: metr.org
    Title: 2025 07 10 early 2025 ai experienced os dev study
    Link: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/
    Source snippet

    When developers are allowed to use AI tools, they take 19% longer to complete issues—a significant slowdown that goes against...

  2. Source: arxiv.org
    Link: https://arxiv.org/abs/2507.09089
    Source snippet

    arXivMeasuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity...

  3. Source: arxiv.org
    Link: https://arxiv.org/abs/2405.16081
    Source snippet

    arXivA Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE ActionsMay 25, 2024...

    Published: May 25, 2024

  4. Source: docs.github.com
    Link: https://docs.github.com/en/copilot/responsible-use/code-review
    Source snippet

    GitHub DocsResponsible use of GitHub Copilot code reviewCopilot code review has a risk of "hallucination" - that is, it may highlight pro...

  5. Source: reuters.com
    Title: ai slows down some experienced software developers study finds 2025 07 10
    Link: https://www.reuters.com/business/ai-slows-down-some-experienced-software-developers-study-finds-2025-07-10/
    Source snippet

    ReutersAI slows down some experienced software developers...Jul 10, 2025 — But the study found that using AI did the opposite: it increa...

  6. Source: time.com
    Title: In the Loop: AI Promised Faster Coding
    Link: https://time.com/7302351/ai-software-coding-study/
    Source snippet

    This Study DisagreesA recent METR study challenges the assumption that AI accelerates software development. In tests with 16 experienced...

  7. Source: github.blog
    Link: https://github.blog/news-insights/research/research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/
    Source snippet

    The GitHub BlogResearch: quantifying GitHub Copilot's impact on...Sep 7, 2022 — In our research, we saw that GitHub Copilot supports fas...

  8. Source: github.blog
    Title: The Git Hub Blog Does Git Hub Copilot improve code quality?
    Link: https://github.blog/news-insights/research/does-github-copilot-improve-code-quality-heres-what-the-data-says/
    Source snippet

    Here's what...Nov 18, 2024 — Findings in our latest study show that the quality of code written with GitHub Copilot is significantly mor...

  9. Source: github.com
    Title: Git Hub · Change is constant
    Link: https://github.com/
    Source snippet

    GitHub keeps you ahead. · GitHubFrom your first line of code to final deployment, GitHub provides AI and [automation]({{ 'ai-bloom-abun/ai-bloom-abun-98d3a6-ai-medicine-l-7c4177-clinical-seco-8e7b35-automation-bi-6899ab/' | rel...

  10. Source: docs.github.com
    Title: using copilot code review
    Link: https://docs.github.com/copilot/using-github-copilot/code-review/using-copilot-code-review
    Source snippet

    GitHub Copilot code reviewGitHub Copilot can review your code and provide feedback. Where possible, Copilot's feedback includes suggested...

  11. Source: github.com
    Title: copilot code quality
    Link: https://github.com/resources/insights/copilot-code-quality
    Source snippet

    Quality is key: GitHub Copilot and code quality8 May 2025 — THe study found that code created with the help of Copilot was more likely to...

    Published: May 2025

  12. Source: github.com
    Title: Copilot hallucinating / using “own” memory instead actual
    Link: https://github.com/orgs/community/discussions/160959
    Source snippet

    May 29, 2025 — You've highlighted an important issue with Copilot providing incorrect information despite having access to accurate data...

    Published: May 29, 2025

  13. Source: github.blog
    Link: https://github.blog/security/how-to-scan-for-vulnerabilities-with-github-security-labs-open-source-ai-powered-framework/
    Source snippet

    How to scan for vulnerabilities with GitHub Security Lab's...Mar 6, 2026 — GitHub Security Lab Taskflow Agent is very effective at findi...

  14. Source: metr.org
    Link: https://metr.org/blog/2026-02-24-uplift-update/
    Source snippet

    We are Changing our Developer Productivity Experiment...Feb 24, 2026 — Our early 2025 study found the use of AI causes tasks to take 19%...

  15. Source: metr.org
    Link: https://metr.org/
    Source snippet

    METRWe found that when developers used AI tools in early 2025, they took 19% longer than without—AI made them slower. Read more. MALT. A...

  16. Source: arxiv.org
    Link: https://arxiv.org/html/2204.04741v5
    Source snippet

    The aim of this study is to determine if...Read more...

  17. Source: arxiv.org
    Title: How Readable is Model-generated Code?
    Link: https://arxiv.org/pdf/2208.14613
    Source snippet

    NA Madi · 2022 · Cited by 94 — Objective: In this paper, we focus on GitHub Copilot to address the issues of readability...

  18. Source: augmentcode.com
    Title: why ai coding tools make experienced developers 19 slower and how to fix it
    Link: https://www.augmentcode.com/guides/why-ai-coding-tools-make-experienced-developers-19-slower-and-how-to-fix-it
    Source snippet

    Why AI Coding Tools Make Experienced Developers 19%...Oct 3, 2025 — The METR study's 19% slowdown reflects the cognitive overhead of man...

  19. Source: businessinsider.com
    Link: https://www.businessinsider.com/ai-coding-tools-may-decrease-productivity-experienced-software-engineers-study-2025-7
    Source snippet

    Conducted with 16 seasoned developers familiar with open-source projects they had worked on for years, the study randomly assigned partic...

  20. Source: itpro.com
    Title: IT Pro Think AI coding tools are speeding up work?
    Link: https://www.itpro.com/software/development/think-ai-coding-tools-are-speeding-up-work-think-again-theyre-actually-slowing-developers-down
    Source snippet

    Think again - they're actually slowing developers downA recent study by Model Evaluation & Threat Research (METR) challenges common assum...

  21. Source: itpro.com
    Link: https://www.itpro.com/software/development/ai-might-help-speed-up-software-development-but-81-percent-of-devs-now-spend-more-time-reviewing-code-and-its-creating-an-invisible-work-trend-thats-pushing-teams-to-the-limit
    Source snippet

    The widespread integration of AI is causing developers to spend significantly more time on manual code reviews and bug fixes, with 81% re...

  22. Source: tomshardware.com
    Link: https://www.tomshardware.com/software/linux/linus-torvalds-says-ai-bug-reports-have-made-the-linux-security-mailing-list-almost-entirely-unmanageable
    Source snippet

    These tools often identify the same bugs, leading to multiple redundant reports, which overwhelm maintainers who must triage and redirect...

  23. Source: Wikipedia
    Link: https://en.wikipedia.org/wiki/METR
    Source snippet

    METRModel Evaluation and Threat Research (METR) (MEE-tər), is a nonprofit research institute, based in Berkeley, California, that eval...

Additional References

  1. Source: letsdatascience.com
    Link: https://letsdatascience.com/blog/developers-thought-ai-made-them-faster-the-data-said-otherwise
    Source snippet

    AI Coding Tools Made Developers 19% Slower: METR StudyA METR randomized controlled trial found AI coding tools made experienced developer...

  2. Source: reddit.com
    Link: https://www.reddit.com/r/ArtificialInteligence/comments/1mqxamb/ai_makes_experienced_developers_19_slower_based/
    Source snippet

    AI makes experienced developers 19% slower based on...The results showed that developers took 19% longer to complete tasks when using AI...

  3. Source: instagram.com
    Link: https://www.instagram.com/p/DXNOnUbigua/
    Source snippet

    AI tools generate code far faster than any human reviewer...AI tools generate code far faster than any human reviewer can meaningfully p...

  4. Source: linkedin.com
    Link: https://www.linkedin.com/posts/paulius-kuzmickas-1b35ba12b_code-review-is-the-new-bottleneck-in-software-activity-7452242223208898560-xHM5
    Source snippet

    Code Review Bottleneck in Software EngineeringCode review is the new bottleneck in software engineering. A year ago, engineers were the s...

  5. Source: reddit.com
    Link: https://www.reddit.com/r/coding/comments/1gw1ws2/does_github_copilot_improve_code_quality_heres/
    Source snippet

    Does GitHub Copilot Improve Code Quality? Here's How...One point I observed is the first graph shown that lists: Using Copilot 37.8% pas...

  6. Source: reddit.com
    Link: https://www.reddit.com/r/slatestarcodex/comments/1lwrb09/metr_finds_that_experienced_opensource_developers/
    Source snippet

    METR finds that experienced open-source developers...METR finds that experienced open-source developers work 19% slower when using Early...

  7. Source: reddit.com
    Link: https://www.reddit.com/r/programming/comments/1ac7cb2/new_github_copilot_research_finds_downward/
    Source snippet

    New GitHub Copilot Research Finds 'Downward Pressure...We have both noticed and hypothesise that the juniors are using ai assistant stuf...

  8. Source: techradar.com
    Link: https://www.techradar.com/pro/using-ai-might-actually-slow-down-experienced-devs
    Source snippet

    Conducted on 16 seasoned developers working on 246 tasks across familiar open-source projects, the study revealed that while developers i...

  9. Source: devclass.com
    Link: https://www.devclass.com/development/2024/05/14/github-research-reports-high-copilot-satisfaction-from-enterprise-devs-but-others-doubt-productivity-gains/1623421
    Source snippet

    GitHub research reports high Copilot satisfaction from...14 May 2024 — Developers in the trial accepted around 30 percent of Copilot sug...

    Published: May 2024

  10. Source: linkedin.com
    Link: https://www.linkedin.com/posts/rhunterharris_ai-coding-tools-made-experienced-developers-activity-7371559585083535361-qKp_
    Source snippet

    AI coding tools slow experienced developers, not speed...Back in July 2025, a study from METR made some noise: “Early-2025 AI coding too...

    Published: July 2025

Amazon book picks

Further Reading

Books and field guides related to The Review Bottleneck. Use these as the next step if you want deeper reading beyond the article.

BookCover for Software quality

Software quality

By Alan Gillies

First published 1992. Subjects: Quality control, Computer software, Quality control, standards, Software engineering, Computer software,...

Topic Tree

Follow this branch

Parent topic

Coding Tradeoffs

Related pages 2