By
Stuart Kerr, Technology Correspondent, LiveAIWire
A landmark 2019 study in the journal Science examined an algorithm
used by US health insurers to identify patients who would benefit from
additional care management. The algorithm was found to systematically
underestimate the health needs of Black patients, who were assigned lower
risk scores than equally sick white patients and therefore enrolled in care
management at lower rates. The algorithm did not use race as an input
variable. It used healthcare costs as a proxy for health need. Because
systemic inequalities in healthcare access meant that Black patients
historically spent less on healthcare for the same level of illness, the
model learned a bias that produced discriminatory outcomes without any
discriminatory intent in its design.
This is the silent bias problem in its most documented form: AI
systems that produce discriminatory outcomes not through malicious design but
through the ingestion of historical data that reflects historical
discrimination. The bias is structural, not intentional. It is also, in its
consequences, indistinguishable from intentional discrimination for the
people it affects.
Where Algorithmic Bias Shows Up
The evidence base for algorithmic bias across consequential
decision domains has grown substantially over the past decade. In criminal
justice, research by ProPublica documented that the COMPAS recidivism
prediction tool used in sentencing and parole decisions produced false
positive rates for Black defendants that were roughly twice those for white
defendants, predicting future offending that did not occur at significantly
higher rates for Black individuals. In hiring, studies using audit
methodology — submitting identical CVs with names that signal different
racial backgrounds — have documented lower callback rates from AI hiring
systems for candidates with names associated with minority groups. In
mortgage lending, algorithmic systems have been found to deny applications
from minority borrowers at higher rates than from white borrowers with
equivalent financial profiles.
The pattern is not confined to the United States. Research from
the Equality
and Human Rights Commission in the UK has documented algorithmic
bias in employment, credit, and public service allocation contexts, finding
that AI systems are reproducing and in some cases amplifying the patterns of
inequality present in the societies and institutions that generated their
training data.
Why Bias Is Hard to Remove
The technical difficulty of debiasing AI systems reflects the
nature of the problem. Bias in AI arises from multiple sources simultaneously:
biased training data, biased labels assigned to training data, biased feature
selection, and biased evaluation criteria that define what a good model looks
like. Removing bias from one source does not remove bias arising from
others.
The proxy variable problem illustrated by the health insurance
algorithm is particularly intractable. Real-world data is pervasively
correlated with protected characteristics: postcodes correlate with race and
income; educational institutions correlate with socioeconomic background; job
titles correlate with gender. Removing the protected characteristic from the
model does not remove its influence if proxies for it remain in the feature
set. And removing all correlated features often degrades model performance on
the task the model is designed for, creating a direct tension between
fairness and accuracy that does not have a technically neutral
resolution.
What this means for how to evaluate AI systems used in
consequential decisions: the absence of explicitly discriminatory inputs is
not sufficient evidence that a system is fair. Disparity analysis across
demographic groups, counterfactual testing, and audit by independent
researchers with access to the model and its training data are required to
assess whether a system produces equitable outcomes. Most AI systems in
commercial deployment are not subject to audits of this
rigour.
Legal Frameworks and Their Gaps
Existing anti-discrimination law in most jurisdictions prohibits
decisions that discriminate on the basis of protected characteristics,
whether or not that discrimination is intentional. In principle, this means
that AI systems producing discriminatory outcomes are unlawful regardless of
whether the discrimination was designed in. In practice, enforcement is
limited by the opacity of AI decision-making processes, the difficulty of
proving disparate impact in individual cases, and the resources required to
mount legal challenges against well-resourced technology companies and
institutions.
The EU AI Act classifies AI systems used in hiring, credit, and
certain other high-stakes contexts as high-risk, requiring transparency,
human oversight, and bias testing. This is the most comprehensive attempt to
address algorithmic bias through regulation at scale, though implementation
and enforcement remain in early stages. In the United States, the Equal
Employment Opportunity Commission and the Consumer Financial Protection
Bureau have both published guidance on AI discrimination, but comprehensive
federal AI anti-discrimination legislation has not been
enacted.
Intersectionality and Compounding Bias
Research on algorithmic bias has increasingly focused on
intersectional discrimination: the compounding effects of multiple protected
characteristics. A Black woman may face algorithmic bias that is not
adequately captured by studies of race-based bias or gender-based bias
separately, because the intersection of those characteristics produces a
distinct pattern of historical discrimination that manifests distinctively in
training data.
Algorithmic facial recognition has been among the most extensively
studied domains for intersectional bias. Research by Joy Buolamwini and
Timnit Gebru, published in 2018, documented error rates for commercial facial
recognition systems that varied dramatically by gender and skin tone: the
highest error rates were for dark-skinned women, at levels far exceeding
those for light-skinned men. Subsequent research has confirmed and extended
these findings across multiple commercial systems and facial analysis tasks.
The National
Institute of Standards and Technology face recognition vendor
testing has documented demographic disparities across a wide range
of commercial systems.
Accountability and the Path Forward
Addressing silent bias in AI requires both technical measures and
governance measures, applied consistently across the lifecycle of AI systems
from design through deployment. Technical measures include diverse and
representative training data, bias testing across demographic groups during
development, ongoing monitoring of outcome disparities in deployment, and the
development of algorithmic fairness metrics that account for the specific
context and values at stake in each application.
Governance measures include mandatory bias audits conducted by
independent parties with access to system internals, disclosure requirements
that inform affected individuals when AI contributes to consequential
decisions about them, and redress mechanisms that allow individuals to
challenge AI-influenced decisions on grounds of discriminatory impact. The
transparency
and accountability gaps in AI decision-making identified in privacy
contexts apply with equal force to discrimination contexts. The connection to
AI
systems that underserve minority and marginalised groups more
broadly is direct: the silent bias that produces discriminatory outcomes in
high-stakes decisions is part of the same structural problem that produces
inaccessible AI for disabled users and underperforming AI for non-English
speakers. The common thread is design that defaults to majority-group
experience and treats deviation from that experience as an edge case rather
than a requirement.
The cultural and institutional changes required to address silent
bias in AI go beyond technical fixes and regulatory requirements. They
require organisations to take seriously the possibility that their AI systems
are causing harm to groups whose members are not well represented in the
feedback loops that inform system improvement. Algorithmic bias is often
invisible to the majority of users, who are not affected by it, and therefore
does not generate the user complaints that drive commercial product
improvement. Making it visible — through mandatory outcome reporting,
independent auditing, and the inclusion of affected community representatives
in AI governance processes — is the prerequisite for the accountability that
the evidence of harm demands. For those who have experienced discriminatory
AI decisions, the right to challenge those decisions through regulatory
channels is real even if the exercise of that right is not always
straightforward. The
accountability gap in AI decision-making has a fairness dimension
as well as a privacy one: both require the same foundational commitment to
transparency from the organisations deploying these
systems.
About the Author
Stuart Kerr
is a technology correspondent at LiveAIWire, covering artificial
intelligence, emerging technologies, and their impact on society and
industry.