AI can't code

Published in AI

AI can't code

by Nick Farrell on20 February 2025

font size decrease font size increase font size
Print
Email

OpenAI research exposes the limits of artificial engineers.

Despite all the hype, AI is nowhere near replacing human software engineers, according to OpenAI’s research.

A new study has shown that even the most advanced AI models can barely scrape by on real-world programming tasks.

In its latest experiment, OpenAI tested AI models using real engineering challenges. The most capable model, Claude 3.5 Sonnet, completed a dismal 26.2 per cent of hands-on coding tasks and just 44.9 per cent of technical management decisions.

The study used a benchmark called SWE-Lancer, built from 1,488 actual fixes made to Expensify’s codebase—representing $1 million in freelance engineering work. Even with this well-defined dataset, AI struggled to match human expertise.

While the AI models excelled at finding relevant code snippets, they floundered when asked to comprehend how different parts of a program work together. The best it could manage were shallow, surface-level fixes that failed to account for deeper software interactions.

Unlike previous AI coding tests that rely on simplistic algorithm puzzles, OpenAI’s benchmark replicated real-world software development. Tasks ranged from quick $50 bug fixes to intricate $32,000 feature implementations, with every solution rigorously tested in real user environments.

Rate this item

(2 votes)

Tagged under

More in this category: « Reddit makes ten per cent of cash from Google and OpenAI Nearly a third of AI chatbots share data »

AI can't code

Most popular - Notebooks

Latest comments

Read more about: