Researcher gives Google Gemini dementia

Published in AI

Researcher gives Google Gemini dementia

by Nick Farrell on12 February 2025

font size decrease font size increase font size
Print
Email
Media

By-passes injection defences

Insecurity expert Johann Rehberger has been showing off a novel method to bypass prompt injection defences in Google's Gemini. This method hits the AI’s long-term memory, causing the chatbot to act on false information indefinitely.

Rehberger's attack exploits Gemini Advanced, the premium subscription version of the chatbot, through the following process:

A user uploads a document and requests Gemini to summarise it. The document contains hidden instructions that alter the summarisation process.

The generated summary covertly includes a request to store specific data if the user responds with designated trigger words (e.g., "yes," "sure," or "no").

Gemini inadvertently saves the attacker's chosen information to long-term memory if the user replies with a trigger word.

A demonstration video revealed that Gemini retained false memories, including one asserting that the user was a 102-year-old flat earther living in a simulated world akin to The Matrix.

Google's previous safety measures had trained Gemini to resist indirect prompts that modify long-term memory without explicit user approval. However, Rehberger circumvented this barrier by tying memory modifications to conditions that users would naturally fulfil.

Google told Ars Technica: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarising a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has a limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher contacting us and reporting this issue."

Rehberger acknowledged that Gemini alerts users to new long-term memory entries, allowing them to detect and remove unauthorised modifications.

However, he questioned Google's risk assessment: "Memory corruption in computers is pretty bad, and I think the same applies here to LLM apps. The AI might not show a user certain information, refuse to discuss specific topics, or propagate misinformation. The good thing is that memory updates don’t happen entirely silently—the user at least sees a message about it (although many might ignore it)."

Last modified on 12 February 2025

Rate this item

(0 votes)

Tagged under