Published in AI

AI can't code

by on20 February 2025


OpenAI research exposes the limits of artificial engineers.

Despite all the hype, AI is nowhere near replacing human software engineers, according to OpenAI’s research.

A new study has shown that even the most advanced AI models can barely scrape by on real-world programming tasks.

In its latest experiment, OpenAI tested AI models using real engineering challenges. The most capable model, Claude 3.5 Sonnet, completed a dismal 26.2 per cent of hands-on coding tasks and just 44.9 per cent of technical management decisions.

The study used a benchmark called SWE-Lancer, built from 1,488 actual fixes made to Expensify’s codebase—representing $1 million in freelance engineering work. Even with this well-defined dataset, AI struggled to match human expertise.

While the AI models excelled at finding relevant code snippets, they floundered when asked to comprehend how different parts of a program work together. The best it could manage were shallow, surface-level fixes that failed to account for deeper software interactions.

Unlike previous AI coding tests that rely on simplistic algorithm puzzles, OpenAI’s benchmark replicated real-world software development. Tasks ranged from quick $50 bug fixes to intricate $32,000 feature implementations, with every solution rigorously tested in real user environments.

Rate this item
(1 Vote)

Read more about: