The
question of whether artificial intelligence can understand moral consequence
has been debated in philosophy of mind, AI safety research, and cognitive
science for as long as the field has existed. In 2025, it has moved from
abstract inquiry to operational urgency. AI systems are making consequential
decisions about credit, employment, medical treatment, criminal sentencing,
and social benefit allocation that involve moral weight, that affect real
people in real ways, and that require something more than pattern-matching on
historical data to get right. The question is no longer purely philosophical.
It is a design and governance question that the systems already deployed have
not adequately answered.
The challenge is specific. Current AI systems can be trained to
classify outputs as ethical or unethical according to frameworks encoded in
their training data. They can be optimised to maximise outcomes that humans
label as good according to measurable proxies. What they cannot do is
understand moral consequence in the sense that matters: the capacity to
reason about the effects of actions on other agents who have interests, to
weigh competing claims, and to recognise when rules produce outcomes that the
purpose of the rules was meant to prevent. That capacity is not a technical
capability gap that better training data will close. It reflects something
structural about how current AI systems relate to meaning and
consequence.
The Alignment Problem as a Moral Problem
The AI safety field has formalised this challenge as the alignment
problem: the difficulty of specifying what we actually want AI systems to
optimise for in ways that the systems will pursue reliably without producing
harmful side effects. Research
from the Alignment Forum documents the ways in which systems
optimised for measurable proxies of good outcomes systematically diverge from
the actual outcomes humans care about, particularly in edge cases that were
underrepresented in the training data. The divergence is not random. It is
the predictable consequence of optimising for a proxy rather than for the
underlying value the proxy was meant to represent.
The moral version of this problem is that values themselves are
not measurable proxies. Fairness, dignity, and harm are not quantities that
can be directly observed in training data. They are relational concepts whose
meaning depends on context, on who is affected, on what alternatives existed,
and on what the people affected actually care about. Systems trained to
classify decisions as fair or unfair based on historical human judgements
inherit all the inconsistencies and biases in those judgements, without the
capacity to reason about why the judgements were made or when they were
wrong.
Where Moral Consequence Actually Lives
The most concrete evidence of what current AI systems cannot do
with moral consequence comes from domains where they are already being used
for consequential decisions. Predictive risk assessment tools used in
criminal sentencing have been shown to assign higher risk scores to
defendants from particular demographic groups at rates that cannot be
explained by the legally relevant factors the tools are meant to assess.
Hiring algorithms trained on historical employment data replicate the
patterns of historical discrimination without having been programmed to
discriminate. Credit-scoring systems using alternative data sources penalise
people for living in particular postal codes in ways that proxy for race
without the system having any representation of race as a
concept.
In each case, the system is doing what it was designed to do:
identifying patterns in data that predict outcomes. What it is not doing is
reasoning about whether those patterns encode historical injustice that
should not be perpetuated, or whether the outcome being predicted is the
right target to optimise for. As our analysis of AI
governance and accountability in law enforcement found, the
accountability structures for AI decisions in high-stakes domains do not
match the consequentiality of those decisions. The system predicts. The
consequence falls on real people. The gap between those two facts is where
moral understanding should be, and where it is currently
absent.
What Research on AI Ethics Actually Shows
The research programme loosely described as AI ethics has produced
valuable documentation of specific harms produced by AI systems in
deployment, alongside a body of principle-level frameworks that are difficult
to operationalise consistently. The EU
AI Act’s high-risk classification framework represents the most
developed regulatory attempt to impose pre-deployment requirements on AI
systems making consequential decisions, requiring conformity assessments,
transparency obligations, and human oversight mechanisms for systems in
high-risk categories including criminal justice, employment, and credit. Whether
these requirements are sufficient to address the moral understanding gap, as
opposed to the process and transparency gaps they more directly target, is
genuinely contested among researchers in AI ethics and
governance.
The deeper question that current approaches do not resolve is
whether AI systems need to understand moral consequence in order to be
governed well, or whether adequate human oversight can substitute for that
understanding in practice. The answer is likely contextual: in well-defined,
stable decision environments with clear feedback loops, human-governed AI
systems can produce acceptable outcomes without the AI itself having moral
understanding. In complex, dynamic, high-stakes environments where the
relevant moral considerations are contested and evolving, the absence of
genuine moral understanding in the system becomes a governance liability that
human oversight cannot fully compensate for. Most of the domains where AI is
being deployed most rapidly are in the second category, not the first. As our
coverage of how
AI deployment affects vulnerable populations found, the people most
affected by AI moral failures are consistently those least able to contest
the decisions that affect them.
The Practical Implication
The practical implication of taking moral consequence seriously in
AI governance is not that AI systems should be prevented from making
consequential decisions until they develop genuine moral understanding, which
is not a near-term prospect. It is that the human oversight, contestability,
and accountability mechanisms surrounding AI decisions in high-stakes domains
should be substantially more robust than they currently are, precisely
because the systems making those decisions cannot themselves reason about
whether the decisions are morally justified. Karma, in its original
conceptual sense, is about consequences that track back to choices. AI
systems make choices whose consequences track back to no one, unless
governance frameworks ensure they do. That accountability gap is the moral
problem that current AI ethics approaches are still developing the tools to
address.
The Accountability Gap in Practice
The most concrete illustration of AI’s moral consequence deficit
is not in philosophical argument but in documented harm. Algorithmic
recidivism risk scores used in US sentencing decisions have been found in
multiple studies to produce racially disparate outcomes at rates inconsistent
with the legally relevant factors those scores are supposed to capture.
Hiring algorithms have been challenged in US and EU courts for producing
discriminatory screening decisions that no individual human reviewer was
instructed to make. Credit systems using alternative data have generated
class action litigation over disparate impact on protected demographic
groups.
In each case, the harm is real and attributable to an AI system.
In each case, the accountability frameworks governing those systems have been
found wanting: it is difficult to establish legal liability for algorithmic
discrimination when no individual human made the discriminatory decision, and
the opacity of the systems makes establishing causation procedurally
demanding. The governance development most needed is not better ethics
principles, which most major AI developers now publish in some form, but
enforceable accountability mechanisms that attribute specific harms to
specific design choices and create legal and financial consequences for
organisations that deploy systems producing those harms. As our coverage of
how
algorithmic systems in housing markets have been challenged through antitrust
law found, the most effective regulatory responses to AI harm have
been those that identified specific, measurable market failures and applied
existing legal frameworks with sufficient vigour to create deterrent
consequences. That model is applicable to the moral consequence domain, even
if the technical challenges of attribution are greater.
About the Author
Stuart Kerr is the Technology Correspondent for LiveAIWire. He
writes about artificial intelligence, emerging technology, and the forces
reshaping work, business, and society.