Published in AI

Apple’s AI paper gets push-back from other boffins

by on16 June 2025


Probably because it doesn't understand what it's doing

The Fruity Cargo Cult Apple’s research paper, pompously titled The Illusion of Thinking, which confidently claimed that large reasoning models (LRMs) collapse when asked to do anything clever, is faulty, according to a top boffin.

For those who came in late, the paper appeared around the same time as Job’s Mob was being forced to explain why it was the only major tech company that could not develop a working AI product.

Open Philanthropy researcher Alex Lawsen, penned a paper that unpicks Apple’s claims with the surgical precision of a tax auditor on speed. His work titled The Illusion of the Illusion of Thinking, suggests the whole thing reeks more of shoddy experimental design than of any fundamental AI limitations.

Apple overlooked fundamental output constraints, such as token budgets. Models such as Anthropic's Claude were cut off mid-flow, often saying things like, "The pattern continues, but I'll stop here to save tokens." But Job's Mob decided to mark that as a failure, which is about as fair as blaming a satnav for not finishing its route because you ran out of battery.

Then there's the matter of Apple boffins penalising models for failing to solve impossible puzzles. Its river crossing tests included setups that no one could solve without magical boats or a time machine. When the models correctly refused to engage with the nonsense, yet Apple called it a fail. That's not science. That's theatre.

Then there were the evaluation scripts. Rather than judging reasoning ability, Job's Mob used an automated pipeline that only gave full credit for listing every single step. Never mind if the model had to stop halfway due to token limits. Apple's AI team believed that success was about typing quickly rather than thinking deeply.

Lawsen offered an alternative. Instead of demanding every move of the Tower of Hanoi be listed, he asked models to generate recursive Lua functions to solve them. Models like Claude, Gemini, and OpenAI’s o3 handled 15-disk versions just fine. That’s seven disks beyond where Apple claimed they couldn’t cope.

"When you remove artificial output constraints, LRMs seem perfectly capable of reasoning about high-complexity tasks," Lawsen wrote.

His beef isn't that reasoning is easy. It’s that Job's Mob seems confused about how to measure it. "The question isn’t whether LRMs can reason, but whether our evaluations can distinguish reasoning from typing," he said.

If Apple can't even understand what its models are doing, perhaps that's why it still hasn't managed to ship anything vaguely resembling AI that works. But then, this is the same outfit that believed removing headphone jacks was an innovation.

Last modified on 16 June 2025
Rate this item
(0 votes)

Read more about: