Print this page
Published in News

Too much VRAM gives Linux insomnia

by on01 July 2025


AMD's Instinct cards break the nap button with their massive memory

AMD engineer Samuel Zhang has flagged a Linux bug that causes servers to refuse hibernation because they’ve too much VRAM and Instinct accelerators.

Instinct cards are designed for AI and high-performance number crunching in data centres, and they pack silly amounts of VRAM, 192GB each in some models. Toss eight of them into a server and you’re sitting on around 1.5TB of video memory, which is apparently enough to break Linux's bedtime routine.

While Linux fanboys might salivate at those specs, in the server world, that’s not unusual. The real drama kicks in during the hibernation process, when Linux offloads all GPU memory to system RAM using the Graphics Translation Table or shared memory. That memory is then included in a hibernation image, which is copied to a separate memory region before being written to disk.

It is simple enough, until the math goes sideways. When your server's carrying 1.5TB of VRAM and the kernel duplicates it into RAM, memory usage suddenly balloons to 3TB. That’s not going to fly on machines with only 2TB of system RAM, which promptly leads to a crash landing.

Zhang has proposed a fix. One patch trims down system memory needs during hibernation so it can complete, but that creates another issue, resuming the system takes nearly an hour.

To deal with that, Zhang added a third patch that skips restoring those bloated buffer objects on thaw, which slashes resume time dramatically.

You might wonder why anyone would put these high-end servers to sleep in the first place. Some data centres use hibernation to cut power use and ease strain on the grid. It has become a handy trick to avoid large-scale outages, such as the recent one in Spain.

Zhang said: "Reducing system memory consumption during hibernation also helps us hibernate successfully with 2TB of RAM," .

"The resume time was unacceptable, almost an hour. So we added a patch that avoids restoring these VRAM buffer objects, which improved resume performance significantly," he said.

Last modified on 01 July 2025
Rate this item
(0 votes)