That’s quite the efficiency boast, though, as always with Big Tech’s self-congratulatory back-patting, there’s a catch or two.
Using Elo scores—a system better known for ranking chess players than AI models—Google brags that Gemma 3 scores 1338, just a hair behind DeepSeek’s 1363.
That technically makes R1 superior, but Google argues that it achieves near-identical results with a fraction of the hardware. Given that DeepSeek AI has previously deployed 1,814 of Nvidia’s less-powerful H800 GPUs to serve up R1’s responses, Google’s claim of a "sweet spot" between efficiency and performance is an unsubtle dig at competitors drowning in power-hungry GPU clusters.
Google is also eager to point out that Gemma 3 outperforms Meta’s Llama 3 in estimated Elo scores, suggesting it could be the most capable model you can run on a single GPU or TPU.
Google’s developer blog, hosted on HuggingFace, provides more details for those interested in testing Gemma 3.
In a blog post, Google bills the new program as "the most capable model you can run on a single GPU or TPU," referring to the company's custom AI chip, the "tensor processing unit."
"Gemma 3 delivers state-of-the-art performance for its size, outperforming Llama-405B, DeepSeek-V3, and o3-mini in preliminary human preference evaluations on LMArena's leaderboard," the blog post claims, referring to the Elo scores.