AI News

ChatGPT vs Claude: Which AI Writes Better Code in 2025?

By Stuart Kerr, Technology Correspondent, LiveAIWire · 26 June 2025

Share X Facebook

Evaluating
which AI assistant writes better code is a question that has no single
correct answer, because the answer depends critically on which programming
language, which type of coding task, what the user’s level of programming
expertise is, and how the evaluation is conducted. This matters because the
coverage of AI coding assistant comparisons tends toward definitive rankings
that overstate the stability and generalisability of performance differences
that are actually task-specific, model-version-specific, and
evaluation-method-specific. A nuanced answer is less shareable but more
accurate: GPT-5 and Claude 3.7 Sonnet, the current frontier offerings from
OpenAI and Anthropic respectively, both provide capable coding assistance
with genuine differences in character that make one or the other more
suitable depending on the specific use case.

The landscape of AI coding assistance in 2025 is more
sophisticated than the chatbot comparison framing suggests. Most professional
developers who rely heavily on AI for coding use dedicated coding assistants
including GitHub Copilot, Cursor, and Windsurf that integrate directly into
development environments rather than operating through a chat interface.
These tools are themselves built on the same underlying models as ChatGPT and
Claude, but the integration into development workflows, the access to
codebase context, and the specific fine-tuning for coding tasks mean that the
relevant comparison for most professional users is between different
integrated development environments with AI assistance rather than between
chat interfaces. That said, the chat interface comparison remains relevant
for users who are learning to code, for tasks that require extended explanation
and iteration, and for the significant portion of coding assistance that
happens outside a dedicated IDE.

Where GPT-5 Performs Strongly

GPT-5’s performance on coding tasks reflects the broader
capability improvements in its reasoning and problem-solving abilities. On
standard coding benchmarks including HumanEval, MBPP, and SWE-bench, GPT-5
shows improvements over GPT-4o that are particularly pronounced on multi-step
problems requiring the model to plan a solution strategy before implementing
it. Complex algorithm design, debugging code that requires understanding the
interaction between multiple system components, and generating code that
handles edge cases correctly benefit from the improved reasoning that GPT-5
brings to extended problem-solving sequences.

GPT-5’s extended context window advantage is particularly relevant
for coding tasks involving large codebases, where the ability to process and
reason about extensive existing code before generating modifications is a
genuine capability differentiator. Developers working on large, complex
projects who need an AI assistant to understand the broader architecture
before suggesting specific implementations benefit from GPT-5’s superior
handling of long contexts. The model’s strong performance on web development
tasks, where the combination of HTML, CSS, JavaScript, and framework-specific
knowledge is required simultaneously, also reflects its broad training across
the large volume of web development content in its training
data.

Where Claude 3.7 Performs Strongly

Claude 3.7 Sonnet has been specifically noted by many developers
for the quality of its code explanations and its ability to adapt its
communication style to the user’s apparent level of expertise. For users who
are learning programming, or for experienced developers working in an
unfamiliar language or framework, Claude’s tendency to explain its reasoning
and flag potential issues proactively is a significant practical advantage.
Code that is generated with accompanying explanation is more educational and
more reviewable than code generated without it, and the ability to evaluate
AI-generated code critically is essential for avoiding the subtle bugs that
AI coding assistants of all varieties can introduce.

Claude’s performance on safety-relevant coding tasks, including
identifying security vulnerabilities in existing code, avoiding common
security anti-patterns in generated code, and explaining the security
implications of different implementation choices, reflects Anthropic’s
Constitutional AI training approach that emphasises safety and accuracy.
Developers building applications where security is a priority, including web
applications handling sensitive data, financial software, and healthcare
applications, have specifically noted Claude’s stronger performance in this
domain. Anthropic’s published
research on coding capability and safety provides transparent
documentation of their evaluation methodology that OpenAI’s equivalent
documentation does not yet match in detail.

Practical Guidance for Developers

The most practically useful approach to choosing between AI coding
assistants is not to select one definitively but to understand the different
strengths and to use them accordingly. GPT-5’s extended context and stronger
performance on complex multi-step problems make it better suited for
architectural planning, complex refactoring, and tasks requiring synthesis
across large codebases. Claude’s stronger explanation quality and security
awareness make it better suited for learning, code review, and
security-sensitive development. Both perform adequately on the routine code
generation tasks, autocompletion, boilerplate generation, and straightforward
function implementation, that constitute the majority of day-to-day coding
assistance use.

The evaluation methodology matters significantly for anyone
reading benchmark comparisons. Benchmarks conducted by AI companies on their
own models should be treated with appropriate scepticism; independent
evaluations by organisations including LMSYS
Chatbot Arena and academic research groups provide more trustworthy
comparative performance data because they use consistent methodologies and
are not subject to the selection effects that affect company-conducted
benchmarks.

What This Means for You

If you are using AI coding assistance professionally, the most
important investment is not in choosing the right model but in developing the
skills to effectively evaluate, test, and integrate AI-generated code into
your workflow. AI coding assistants of all varieties produce code that
requires human review, testing, and often modification before production use,
and the ability to review AI-generated code critically is a skill that
complements rather than substitutes for the ability to write code
independently. Both GPT-5 and Claude 3.7 are capable enough that your
productivity with either depends more on how you use them than on the
inherent capabilities of one versus the other. For related coverage, see our
analysis of LLM
capabilities in 2025 and the
GPT-5 release.

Professional norms around AI-assisted code development are
developing unevenly across the software industry. Questions about
attribution, disclosure, and review standards for AI-generated code are being
addressed inconsistently across different companies, open-source communities,
and client contexts. The open-source software community has been particularly
active in developing norms, with several major projects establishing explicit
policies on AI-generated contributions. The BCS, the Chartered Institute for
IT, has developed professional guidance on AI-assisted development
that addresses client disclosure obligations and quality standards in the context
of professional accountability. Consistent, transparent norms focused on code
quality rather than tool preference will produce better outcomes for the
profession than the current inconsistency, which creates uncertainty for
individual developers about their professional obligations and for clients
about the standards being applied to their software projects. Developing
these norms is a professional responsibility that industry bodies, employers,
and professional associations all have a role in shaping.

The question of intellectual
property in AI-assisted software development extends beyond the professional
norms debate to concrete legal questions about code ownership that are being
resolved through litigation and legislation in multiple jurisdictions. When
an AI coding assistant generates code that is substantially similar to code
in its training dataset, the ownership of that generated code is legally
unclear in ways that create risk for organisations deploying AI-generated
code in commercial products. Several high-profile cases in the US involving
GitHub Copilot’s reproduction of open-source licensed code have raised these
questions concretely, with implications for organisations that have adopted
AI coding assistance at scale without adequately assessing the intellectual
property risks of their workflows.

About the Author

Stuart Kerr is a technology correspondent at LiveAIWire, covering
artificial intelligence, digital innovation, and the social impact of
emerging technologies. Follow LiveAIWire for daily analysis at liveaiwire.com.