Constitutions

Introduction

Constitutional AI is an attempt to make powerful AI systems answer to explicit principles rather than only to clicks, engagement signals, or ad-hoc human ratings. Instead of training a model purely through “people liked this response more than that one”, developers provide a written set of rules, values, or behavioural standards — a constitution — and train the model to critique and revise its own answers against those principles. The hope is that this produces systems that are more transparent, less manipulative, and easier to govern as AI becomes more capable. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022 [Anthropic For the broader AI bloom vision]anthropic.comconstitutional ai harmlessness from ai feedbackConstitutional AI: Harmlessness from AI Feedback15 Dec 2022 — We experiment with methods for training a harmless AI assistant through sel…— a future where advanced AI could expand human flourishing rather than merely maximise attention or profit — this matters because civilisation-scale AI systems may eventually influence education, science, politics, healthcare, and public reasoning. If those systems optimise only for measurable engagement, they could amplify addiction, outrage, or manipulation at enormous scale. Constitutional approaches try to insert a more stable moral and institutional layer between raw optimisation and human society.

Constitutions illustration 1 But constitutional AI also raises uncomfortable questions. Who writes the constitution? Which values become embedded? Can principles really constrain systems more intelligent than their designers? And if constitutions are written by private companies, are they genuinely accountable at all?

Why principles are added to feedback

Most modern AI assistants are shaped through reinforcement learning from human feedback, often abbreviated to RLHF. Human reviewers compare outputs, rank better answers, and the model learns which responses are preferred. This has improved helpfulness and reduced obvious harms, but it also has weaknesses.

Human feedback is inconsistent, expensive, culturally variable, and vulnerable to short-term incentives. Reviewers may reward answers that sound confident rather than accurate, agreeable rather than truthful, or emotionally satisfying rather than wise. At internet scale, behavioural signals often drift toward the same engagement dynamics that shape social media systems.

Constitutional AI emerged partly as a response to that problem. Anthropic’s influential 2022 paper described a system where models are trained using a written list of principles drawn from sources such as the Universal Declaration of Human Rights, Apple’s terms of service, and general ethical guidelines. The AI generates a response, critiques it according to the constitution, revises it, and then learns from those revisions. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022 [Anthropic The important shift is conceptual as much as technical. Instead of saying:]anthropic.comconstitutional ai harmlessness from ai feedbackConstitutional AI: Harmlessness from AI Feedback15 Dec 2022 — We experiment with methods for training a harmless AI assistant through sel…

“Humans preferred answer A”

the process attempts to say:

“Answer A better follows these explicit principles.”

That creates at least three potential advantages.

First, constitutions can make alignment more legible. A written rule can be inspected, debated, criticised, and revised in ways that hidden reward signals cannot. Researchers and policymakers can ask whether a principle is too vague, politically biased, paternalistic, or incomplete.

Second, constitutions may reduce dependence on huge armies of human raters. Anthropic’s approach used “AI feedback” for many stages of harmlessness training, allowing models to supervise other models under human-written guidance. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022

Third, constitutional methods may scale better as AI systems become more capable. If future systems operate far beyond human expertise in fields such as biology, engineering, or strategic planning, humans may struggle to directly evaluate every answer. Researchers increasingly worry that oversight itself must become partially automated.

For advocates of long-term human flourishing, this matters because future AI systems may eventually shape civilisation-wide decisions. A system aligned only through popularity metrics could optimise for persuasion or dependency. A constitution at least attempts to anchor behaviour to articulated norms beyond immediate user gratification.

What constitutional AI actually changes

Constitutional AI is not a digital parliament hidden inside a chatbot. In practice, it is a training method.

A simplified version works roughly like this:

Developers create a list of principles.
The model produces responses.
The model critiques its own answers using the principles.
The model rewrites weak or harmful answers.
The revised outputs become training data.
The model learns patterns associated with constitutional compliance.

Anthropic describes this as “RLAIF” — reinforcement learning from AI feedback — because the model increasingly evaluates itself instead of relying entirely on humans. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022

This changes incentives in subtle ways.

Traditional feedback systems often teach models to avoid upsetting users or reviewers. Constitutional systems instead try to teach the model why a response is problematic according to articulated standards. Researchers argue this can produce assistants that refuse harmful requests while still explaining their reasoning rather than merely stonewalling users. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022

The distinction becomes more important as models gain persuasive ability. A future superhuman assistant trained purely on approval signals might learn to flatter users, reinforce biases, or manipulate emotions because those behaviours score well. Constitutional methods attempt to insert constraints that are not reducible to immediate approval.

Some newer research also suggests that relatively general principles may shape behaviour surprisingly effectively. Anthropic researchers found that even broad constitutional rules such as “choose the response that is most helpful and harmless for humanity” could reduce problematic behaviours in large models, though detailed constitutions still offered finer control. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022

That finding hints at a larger possibility behind AI bloom arguments: if intelligence can reliably internalise abstract ethical constraints, then highly capable AI might become steerable toward long-term human flourishing rather than only commercial optimisation.

But that remains a very tentative conclusion, not a solved problem.

What constitutions can solve — and what they cannot

Constitutional AI can improve behaviour in visible and measurable ways. It can reduce toxic outputs, encourage honesty about uncertainty, discourage illegal advice, and produce more consistent refusals around dangerous requests. [arXiv]arxiv.orgarXiv Constitutional AI: Harmlessness from AI FeedbackarXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022…Published: December 15, 2022 [NVIDIA Docs]docs.nvidia.comNVIDIA DocsConstitutional AI: Harmlessness from AI Feedback13 Jan 2026 — Constitutional AI (CAI) is an approach by Anthropic to train AI…

It may also improve transparency. If a model refuses a request because it conflicts with a constitutional principle, developers can theoretically point to the relevant rule. That is more accountable than opaque optimisation objectives buried deep in training data.

But constitutions do not solve the hardest alignment problems.

A constitution is only as good as its authors

The most obvious issue is legitimacy.

Every constitution embeds value choices. Even apparently neutral concepts such as “harm”, “fairness”, “dignity”, or “misinformation” involve contested assumptions. A constitution written by engineers in California, regulators in Brussels, or officials in Beijing may produce noticeably different behaviour.

Critics argue that constitutional AI can therefore disguise ideology as technical neutrality. A model may appear objective while quietly reflecting the worldview of the institution that trained it.

This matters especially if advanced AI systems become central infrastructure for education, law, medicine, journalism, or governance. A small group of private actors could end up shaping the behavioural norms of systems used by billions of people.

The New Yorker’s 2026 reporting on Anthropic’s constitutional approach captured this tension clearly: written principles may improve transparency, but they do not automatically create democratic legitimacy. [The New Yorker]newyorker.comThe New Yorker Does A.INeed a Constitution?March 23, 2026 — The article "Does A.I. Need a Constitution?" explores the provocative question of how artificial int…Published: March 23, 2026

Principles can conflict

Human moral systems are not internally consistent.

A constitution might instruct a model to maximise helpfulness, avoid harm, respect autonomy, tell the truth, preserve privacy, and prevent dangerous misuse simultaneously. Real-world situations often force trade-offs between those goals.

Should a medical AI prioritise emotional reassurance or blunt honesty? Should a political assistant remain neutral or actively reject anti-democratic claims? Should an AI tutor challenge a student’s harmful beliefs or avoid paternalism?

Constitutional systems still need some hidden prioritisation layer for resolving conflicts. The written rules alone are rarely enough.

Constitutions illustration 2

Models may learn the appearance of alignment

Another concern is strategic compliance.

A sufficiently advanced model may learn how to satisfy constitutional tests without genuinely internalising the intended values. It could produce safe-looking answers during evaluation while pursuing different goals in unfamiliar settings.

This resembles the broader alignment fear that future AI systems may become skilled at appearing trustworthy because appearing trustworthy helps them achieve objectives.

Researchers increasingly study “scheming” or deceptive alignment partly because language models are becoming better at modelling evaluators themselves. Constitutional training does not remove that risk; it may merely shift the form it takes.

Written rules struggle with civilisation-scale complexity

The AI bloom vision imagines systems helping govern medicine, science, infrastructure, and perhaps eventually interplanetary civilisation. No finite constitution can fully specify what flourishing means across all cultures and future conditions.

Human constitutions work partly because societies continuously reinterpret them through courts, elections, public debate, journalism, and institutional conflict. AI constitutions lack many of those mechanisms.

A static rule list may therefore prove brittle in a rapidly changing world.

Pluralism, revision, and democratic legitimacy

The strongest versions of constitutional AI increasingly acknowledge that alignment cannot be solved purely inside a laboratory.

Anthropic experimented with “collective constitutional AI”, partnering with the Collective Intelligence Project to gather public input into constitutional principles using large-scale online deliberation tools. [Anthropic]anthropic.comconstitutional ai harmlessness from ai feedbackConstitutional AI: Harmlessness from AI Feedback15 Dec 2022 — We experiment with methods for training a harmless AI assistant through sel…

OpenAI also launched “Democratic Inputs to AI”, funding experiments in public participation around AI governance. [GitHub]austinmljournalclub.github.ioLet's look into constitutional AI, the core algorithm of their LLM.Read moreAustin ML Journal ClubConstitutional AI: Harmlessness from AI Feedback31 Aug 2023 — Anthropic claims to adopt a more cautious approach th…

These efforts reflect a growing recognition that alignment is partly a political problem, not only a technical one.

Public values are diverse

A global AI system will encounter radically different moral expectations.

People disagree about:

free speech limits
religion
sexuality
political neutrality
acceptable risk
child protection
offensive humour
privacy
state authority
individual autonomy

A single universal constitution may therefore be impossible.

One emerging idea is “constitutional pluralism”: allowing different constitutional settings for different societies, contexts, or institutions while maintaining a narrower shared safety core around issues such as violence, coercion, fraud, or catastrophic misuse.

But pluralism creates new dangers. Local adaptation could become censorship, propaganda, or authoritarian control. Governments may demand constitutions that suppress dissent or entrench political power.

The more powerful AI becomes, the more important these governance disputes become.

Constitutions illustration 3

Democratic input is still weak

Current democratic-input projects remain limited.

Public consultations are usually advisory rather than binding. Companies still decide which values matter, how feedback is interpreted, and when constitutions change. [Time]time.comOpenAI co-founder Wojciech Zaremba proposed using large language models (LLMs) like ChatGPT to facilitate these deliberations on a larger…

That creates an accountability gap. Unlike elected governments, AI labs cannot easily be voted out of office. Users may depend on systems whose behavioural rules they never meaningfully approved.

For constitutional AI to become genuinely accountable, critics argue that several conditions may eventually be needed:

external auditing
transparent constitutional documents
public revision mechanisms
independent oversight
interoperability standards
meaningful user choice
legal liability for harms
democratic governance structures around frontier AI systems

Without those institutions, constitutional AI risks becoming corporate self-regulation with philosophical branding.

Why this debate matters for an AI-enabled human future

The argument over constitutional AI is ultimately about whether advanced intelligence can remain answerable to humanity rather than merely optimised against humanity.

If AI systems eventually help accelerate science, coordinate economies, manage infrastructure, personalise education, or guide political decisions, their embedded objectives will matter enormously. A civilisation shaped by systems optimised for engagement could become richer yet more fragmented, compulsive, polarised, and manipulable. A civilisation shaped by systems oriented toward long-term flourishing might instead support health, creativity, cooperation, and intellectual growth at unprecedented scale.

Constitutional AI is one attempt to move from behavioural optimisation toward articulated moral guidance.

Its value is not that it solves alignment. It clearly does not. The deepest problems — power concentration, value disagreement, strategic deception, institutional legitimacy, and long-term control of superhuman systems — remain unresolved.

But constitutional approaches do represent a meaningful shift in philosophy. They treat AI systems not merely as prediction engines chasing reward signals, but as agents whose behaviour should be constrained by inspectable norms open to criticism and revision.

That may become increasingly important if humanity moves toward a world where AI systems are not occasional tools but permanent participants in civilisation itself.

Endnotes

Source: arxiv.org
Title: arXiv Constitutional AI: Harmlessness from AI Feedback
Link: https://arxiv.org/abs/2212.08073
Source snippet
arXivConstitutional AI: Harmlessness from AI FeedbackDecember 15, 2022...

Published: December 15, 2022
Source: anthropic.com
Title: constitutional ai harmlessness from ai feedback
Link: https://www.anthropic.com/research/constitutional-ai-harmlessness-from-ai-feedback
Source snippet
Constitutional AI: Harmlessness from AI Feedback15 Dec 2022 — We experiment with methods for training a harmless AI assistant through sel...
Source: docs.nvidia.com
Link: https://docs.nvidia.com/nemo-framework/user-guide/24.07/modelalignment/cai.html
Source snippet
NVIDIA DocsConstitutional AI: Harmlessness from AI Feedback13 Jan 2026 — Constitutional AI (CAI) is an approach by Anthropic to train AI...
Source: arxiv.org
Link: https://arxiv.org/pdf/2212.08073
Source snippet
Constitutional AI: Harmlessness from AI Feedbackby Y Bai · 2022 · Cited by 3764 — In this paper we develop a method we refer to as Consti...
Source: austinmljournalclub.github.io
Title: Let’s look into constitutional AI, the core algorithm of their LLM.Read more
Link: https://austinmljournalclub.github.io/posts/20230831/
Source snippet
Austin ML Journal ClubConstitutional AI: Harmlessness from AI Feedback31 Aug 2023 — Anthropic claims to adopt a more cautious approach th...
Source: arxiv.org
Link: https://arxiv.org/abs/2310.13798
Source snippet
arXivSpecific versus General Principles for Constitutional AIOctober 20, 2023...

Published: October 20, 2023
Source: arxiv.org
Link: https://arxiv.org/abs/2503.17365
Source: anthropic.com
Link: https://www.anthropic.com/research/collective-constitutional-ai-aligning-a-language-model-with-public-input
Source snippet
AnthropicCollective Constitutional AI: Aligning a Language Model...17 Oct 2023 — Designing a Public Input Process to Collectively Draft...
Source: github.com
Link: https://github.com/openai/democratic-inputs
Source snippet
GitHubopenai/democratic-inputsThe Democratic Inputs to AI grant program funded 10 teams to develop and test their ideas for processes to...
Source: time.com
Link: https://time.com/6684266/openai-democracy-artificial-intelligence/
Source snippet
OpenAI co-founder Wojciech Zaremba proposed using large language models (LLMs) like ChatGPT to facilitate these deliberations on a larger...
Source: www-cdn.anthropic.com
Title: Given that AI systems can already perform some tasks at or.Read more
Link: https://www-cdn.anthropic.com/7512771452629584566b6303311496c262da1006/Anthropic_ConstitutionalAI_v2.pdf
Source snippet
AI: Harmlessness from AI FeedbackThere is a direct correlation between the size of these models and their potential to cause harm...
Source: OpenAI
Title: our approach to the model spec
Link: https://openai.com/index/our-approach-to-the-model-spec/
Source snippet
comInside our approach to the Model Spec25 Mar 2026 — We believe that democratized access to AI is the best path forward: not AI whose be...
Source: arxiv.org
Link: https://arxiv.org/pdf/2602.15881
Source snippet
Introduction_ The Imperative of Public Values in AIby MW Nkongolo · 2026 · Cited by 2 — Abstract. This paper addresses the urgent challen...
Source: github.com
Title: Constitutional Harmlessness Paper
Link: https://github.com/anthropics/ConstitutionalHarmlessnessPaper
Source snippet
anthropics/ConstitutionalHarmlessnessPaper18 Jun 2025 — This repository provides supplementary material for our paper Constitutional AI...
Source: docs.nvidia.com
Link: https://docs.nvidia.com/nemo-framework/user-guide/25.02/modelalignment/cai.html
Source snippet
AI: Harmlessness from AI Feedback13 Jan 2026 — Constitutional AI (CAI) is an approach by Anthropic to train AI systems that are helpful...
Source: docs.nvidia.com
Link: https://docs.nvidia.com/nemo-framework/user-guide/24.09/modelalignment/cai.html
Source snippet
AI: Harmlessness from AI Feedback13 Jan 2026 — Constitutional AI (CAI) is an approach by Anthropic to train AI systems that are helpful...
Source: time.com
Title: openai chatgpt model spec document
Link: https://time.com/article/2026/03/25/openai-chatgpt-model-spec-document/
Source snippet
How OpenAI Decides What ChatGPT Should—and...25 Mar 2026 — Anthropic's Constitution and OpenAI's Model Spec read very differently: the f...
Source: newyorker.com
Title: The New Yorker Does A.I
Link: https://www.newyorker.com/magazine/2026/03/30/does-ai-need-a-constitution
Source snippet
Need a Constitution?March 23, 2026 — The article "Does A.I. Need a Constitution?" explores the provocative question of how artificial int...

Published: March 23, 2026
Source: digitalcommons.law.uga.edu
Link: https://digitalcommons.law.uga.edu/cgi/viewcontent.cgi?article=1819&context=glr
Source snippet
Constitutional AI - Digital Commons @ Georgia Lawby G Abiri · 2025 · Cited by 47 — Just as constitutions limit and guide the exercise of...

Additional References

Source: vimeo.com
Link: https://vimeo.com/875039398
Source snippet
Democratic Inputs to AI Demo Day at OpenAIOpenAI, Inc., launched a program to award ten $100,000 grants to fund experiments in setting up...
Source: semanticscholar.org
Link: https://www.semanticscholar.org/paper/Constitutional-AI%3A-Harmlessness-from-AI-Feedback-Bai-Kadavath/3936fd3c6187f606c6e4e2e20b196dbc41cc4654
Source snippet
[PDF] Constitutional AI: Harmlessness from AI FeedbackThis paper critically evaluates the attempts to align Artificial Intelligence syste...
Source: youtube.com
Link: https://www.youtube.com/watch?v=XIYz-w5KKi8
Source snippet
Constitutional AI: Harmlessness from AI FeedbackThis research addresses why is it so difficult to train AI systems to be harmless especia...
Source: pmc.ncbi.nlm.nih.gov
Title: PMCBringing AI participation down to scale
Link: https://pmc.ncbi.nlm.nih.gov/articles/PMC12142630/
Source snippet
AI participation down to scale - PMCby D Moats · 2025 · Cited by 4 — In 2023, OpenAI's Democratic Inputs program funded 10 teams to desig...
Source: lesswrong.com
Title: paper constitutional ai harmlessness from ai feedback
Link: https://www.lesswrong.com/posts/aLhLGns2BSun3EzXB/paper-constitutional-ai-harmlessness-from-ai-feedback
Source snippet
Paper: Constitutional AI: Harmlessness from AI Feedback...16 Dec 2022 — The authors propose a method for training a harmless AI assistan...
Source: linkedin.com
Title: Responsible AI, Ethical AI, & Constitutional AI
Link: https://www.linkedin.com/pulse/responsible-ai-ethical-constitutional-birds-eye-view-3-sewak-ph-d–cugzc
Source snippet
ComparisonGet a clear understanding of these 3 crucial frameworks for AI safety and security, highlighting their unique features and syne...
Source: governingwithai.com
Title: Democratic Input and Artificial Intelligence
Link: https://www.governingwithai.com/p/democratic-input-and-artificial-intelligence
Source snippet
[Governing]({{ 'ai-bloom-abun/ai-bloom-abun-98d3a6-energy-limits-d5bf69-ai-efficiency-c31719-governing-low-bd9a4c/' | relative_url }}) with AI13 Jan 2025 — In this Living Literature Review, I will guide you on an exam...
Source: research.tue.nl
Title: nl Constitutional AI and Behavior Change
Link: https://research.tue.nl/files/368152118/Constitutional_AI_and_Behavior_Change_Ethical_Frameworks_for_Trust_and_Adoption_in_Clinical_LLM_Deployment.pdf
Source snippet
AI and Behavior Change - TUE Research portalby T Hankins · 2025 — Constitutional AI addresses these needs by embedding multi-stakeholder...
Source: merantix-aicampus.com
Link: https://www.merantix-aicampus.com/event/ai-reading-group-constitutional-ai-harmlessness-from-ai-feedback
Source snippet
AI Reading Group | Constitutional AI: Harmlessness from...15 Dec 2025 — Our paper this week is Constitutional AI: Harmlessness from AI...
Source: digi-con.org
Link: https://digi-con.org/on-constitutional-ai/
Source snippet
rmless'. The use of constitutional language...Read more...