Google DeepMind’s MoR Could Replace the Transformer Architecture

By Stuart Kerr, Technology Correspondent, LiveAIWire · 9 May 2026

Share X Facebook

By
Stuart Kerr, Technology Correspondent,
LiveAIWire

The Transformer architecture has been
the backbone of modern AI since 2017, powering every major language model
from GPT to Claude to Gemini. In July 2025, Google
DeepMind, in collaboration with KAIST AI and the Mila Quebec AI
Institute, published research on a new architecture called Mixture of
Recursions, or MoR, which its developers describe as a viable successor. The
results warrant attention: MoR reduces memory usage by approximately 50
percent, doubles inference speed compared to conventional Transformers of
equivalent capability, and outperforms the Transformer baseline on most
natural language processing tasks while using nearly half the parameters.
Training duration is reduced by 19 percent. These are not marginal
improvements on the margin of the state of the art. They are structural
efficiency gains that, if they hold at scale, would materially change the
economics of deploying AI.

The timing matters as much as
the technical claims. The AI industry is in the middle of a compute cost
crisis: the energy and hardware demands of training and running large Transformer
models have become a significant constraint on deployment, with data centre
electricity demand projected by the IEA to more than double by 2030. Any
architecture that maintains capability while substantially reducing compute
requirements directly addresses that constraint. MoR is the most significant
architectural proposal to emerge from a major AI lab in this direction since
the Transformer itself.

How MoR Is Different From
Transformers

The Transformer processes input through fixed
layers, applying self-attention at each layer to capture relationships
between all tokens in the sequence. This approach is powerful but
computationally expensive: attention complexity scales quadratically with
sequence length, which creates significant overhead as context windows grow
larger. MoR takes a fundamentally different approach. Instead of processing
all tokens through the same fixed depth, MoR uses recursive feedback loops
that adaptively allocate compute resources based on the complexity of the
input. Tokens that require more processing receive more computation; tokens
that can be resolved quickly get fewer passes. This dynamic allocation is
what drives the efficiency gains: the architecture does more work where it
matters and less where it does not.

The distinction from
Mixture of Experts (MoE) models, which also use selective routing, is
significant. MoE architectures route computation across a large ensemble of
expert sub-networks, requiring vast numbers of parameters to populate those
experts. MoR eliminates this parameter overhead while preserving the
adaptability advantage, using recursive depth instead of ensemble breadth to
achieve selective computation. The result is a model that needs substantially
fewer parameters to achieve comparable or superior performance on standard
benchmarks, which is exactly the property that makes it interesting for
deployment at scale.

What the Benchmarks Actually
Show

DeepMind’s published benchmarks show MoR matching or
surpassing traditional Transformer models on most NLP tasks under equivalent
computational budgets, with the advantage becoming more pronounced as model
size grows. When model size exceeds 360 million parameters, MoR consistently
outperforms the Transformer baseline at low to medium computational budgets,
which is the operating range that matters most for practical deployment. The
few-shot accuracy advantage of 0.8 percentage points over traditional
Transformers is modest in absolute terms but consistent across task
categories. The inference throughput gains, achieved through early token
processing and optimised batch management, are more immediately commercially
significant.

The important caveat is scale. The published
MoR research covers models up to around 360 million parameters, which is
orders of magnitude smaller than the frontier models that define current AI
capability. Whether MoR’s efficiency advantages hold at 70 billion, 200
billion, or larger parameter counts, the scales at which Transformer
architectures have been most thoroughly validated, remains to be
demonstrated. This is the standard challenge for new architectures: promising
results at research scale do not guarantee performance at frontier deployment
scale. Mamba, a state-space model that generated similar excitement as a
potential Transformer successor in 2023, has not displaced Transformers at
the frontier despite strong results at smaller scales. MoR faces the same
validation challenge.

Why the Industry Is Paying
Attention

The practical stakes of a viable Transformer
alternative are unusually high given the current moment. Investment in AI
infrastructure is at record levels, and the companies spending the most on
that infrastructure, including the major cloud providers and AI labs, have a
direct financial interest in any architecture that reduces the compute cost
of inference. The MIT
Technology Review’s coverage of post-Transformer architectures has
noted that investment in non-Transformer research grew significantly in 2024
and 2025, reflecting a recognition that the quadratic attention complexity of
Transformers will become an increasingly binding constraint as context
lengths and deployment volumes grow.

For developers and
organisations deploying AI, the MoR research is most relevant as a signal
that the architectural status quo is genuinely contestable rather than as an
immediate deployment recommendation. Models built on MoR are not yet
available at the scale or with the training data coverage of frontier
Transformer models. What MoR establishes is that there are plausible
alternatives to the Transformer that maintain performance while addressing
its efficiency limitations, which changes the long-term investment calculus
for organisations building on AI infrastructure. Understanding how smaller
and more efficient AI models are already challenging large deep learning
systems on specific tasks provides parallel context for the broader
architectural shift MoR represents. And the trillion-scale
valuations being placed on AI companies reflect investment theses
built on Transformer architectures that may need revising if non-Transformer
alternatives continue to demonstrate competitive performance. The
energy crisis driving AI infrastructure decisions is the specific
pressure that makes MoR’s efficiency claims more commercially relevant than a
comparable architecture proposal would have been three years
ago.

The Broader Context: Why Efficiency Matters
Now

The MoR announcement arrived at a moment of heightened
urgency around AI compute efficiency. Electricity demand from data centres is
projected by the IEA to more than double to over 1,000 terawatt-hours by 2030,
and the capital expenditure plans of the major cloud providers reflect a
sustained commitment to building out the infrastructure to meet that demand.
Against this backdrop, any architecture that delivers equivalent model
capability at lower compute cost has direct commercial value, which is
different from the situation in 2020 when efficiency was a research interest
rather than a strategic priority. The companies funding AI infrastructure at
scale have significant financial interest in seeing MoR or architectures like
it validated at frontier scale, because the alternative is continued linear
growth in infrastructure spend per unit of model
capability.

The research community’s response to MoR has
been measured rather than effusive, which is itself informative. Previous
announcements of Transformer successors, including Mamba and RWKV, generated
significant excitement before proving difficult to scale to the parameter
counts where Transformers are most comprehensively validated. The honest
assessment of MoR is that it has demonstrated a credible and novel approach
to the efficiency problem, with benchmark results that warrant serious
follow-up research at larger scales. Whether those larger-scale experiments
reproduce the efficiency gains seen at 360 million parameters will determine
whether MoR achieves the architectural transition its developers are
proposing. That evidence does not yet exist, and its absence is the principal
reason cautious optimism rather than confident prediction is the appropriate
stance.

About the Author

Stuart Kerr is
Technology Correspondent at LiveAIWire, covering artificial intelligence,
cybersecurity, and the social impact of emerging technology. He publishes daily
at LiveAIWire.com.

LiveAIWire — AI News, Analysis and Future Technology

How MoR Is Different From Transformers

What the Benchmarks Actually Show

Why the Industry Is Paying Attention

The Broader Context: Why Efficiency Matters Now

About the Author

How MoR Is Different From
Transformers

What the Benchmarks Actually
Show

Why the Industry Is Paying
Attention

The Broader Context: Why Efficiency Matters
Now