The AGI Consistency Problem: Why Smarter Machines Still Stumble -

Artificial General Intelligence (AGI) has long been the holy grail of AI research—a system capable of performing any intellectual task a human can do. But despite headline-grabbing wins in elite competitions, AI still falters on tasks most high school students can solve. The gap, according to experts, is rooted in a fundamental flaw: inconsistency.

By Stuart Kerr, Technology Correspondent

Published: 14 August 2025

Last Updated: 14 August 2025

Contact: liveaiwire@gmail.com | Twitter: @LiveAIWire

Author Bio: About Stuart Kerr

Google DeepMind CEO Demis Hassabis recently told Business Insider that even the most advanced AI models can ace Olympiad-level math problems yet fail simpler high school exercises. He frames this as a challenge of “consistency” — the ability to apply reasoning skills reliably across different levels of difficulty.

This mirrors coverage from Times of India, which highlights how AI’s uneven performance raises concerns about readiness for real-world deployment in sensitive sectors like healthcare and law.

Beyond the Hype

In an interview with India Today, Hassabis noted that current AI benchmarks may be too narrow, rewarding exceptional results in niche domains without penalising failures on everyday problems. This creates a public perception that AGI is closer than it really is.

The academic community has been warning of this illusion for years. In Why AI is Harder Than We Think (PDF), Melanie Mitchell outlines how intelligence is context-dependent and prone to brittleness when faced with novel situations.

The Policy Perspective

Another camp of researchers argues that the very pursuit of AGI as a single goal is flawed. Stop Treating “AGI” as the North-Star Goal of AI Research (PDF) contends that AI policy and funding should prioritise diverse, specialised systems that address specific human needs over monolithic “human-level” machines.

These arguments echo debates we’ve covered in our own reporting, such as the AI-arms race: Google, Amazon, Meta, where corporate competition often sidelines nuanced discussions about capability limitations.

Building Reliability Into AI

Some developers are tackling the consistency problem head-on. The emerging MoR (Mixture of Reasoners) model architecture—explored in our feature New AI model MoR succeeds transformers—aims to combine specialised reasoning modules under a unified control system, potentially mitigating erratic performance.

But while architectures evolve, so too must evaluation methods. Experts propose a mix of high-stakes testing and low-level diagnostics to ensure AI systems are not just brilliant but dependable.

A Measured March Toward AGI

It’s tempting to see breakthroughs as proof AGI is within reach, but the reality is more complex. As Hassabis acknowledges, the technology’s unevenness could undermine trust if left unresolved.

For policymakers, the lesson is clear: fund innovation, but pair it with rigorous, broad-spectrum testing standards. For the public, the takeaway may be even simpler—don’t mistake flashes of genius for full human equivalence.

Our earlier analysis of Google’s EU AI code of practice decision reinforces this point: transparency and accountability must evolve alongside capability if AGI is to be both real and reliable.

About the Author

Stuart Kerr is the Technology Correspondent for LiveAIWire. He writes about artificial intelligence, ethics, and how technology is reshaping everyday life. Read more

Leave a Reply Cancel reply

Related Stories

How AI Is Reshaping Insurance: Who Benefits and Who Gets Left Behind

Why AI Will Make Tax Returns Obsolete Within a Decade

Why AI Will Make Flying Safer Than It Has Ever Been in Human History

You may have missed

How AI Is Reshaping Insurance: Who Benefits and Who Gets Left Behind

Why AI Will Make Tax Returns Obsolete Within a Decade

Elon Musk, a Million Satellites, and a Data Centre in the Sky: The Most Ambitious Plan in Tech History

Why AI Will Make Flying Safer Than It Has Ever Been in Human History