Published in AI

Researcher gives Google Gemini dementia

by on12 February 2025


By-passes injection defences

Insecurity expert Johann Rehberger has been showing off a novel method to bypass prompt injection defences in Google's Gemini. This method hits the AI’s long-term memory, causing the chatbot to act on false information indefinitely.

Rehberger's attack exploits Gemini Advanced, the premium subscription version of the chatbot, through the following process:

A user uploads a document and requests Gemini to summarise it. The document contains hidden instructions that alter the summarisation process.

The generated summary covertly includes a request to store specific data if the user responds with designated trigger words (e.g., "yes," "sure," or "no").

Gemini inadvertently saves the attacker's chosen information to long-term memory if the user replies with a trigger word.

A demonstration video revealed that Gemini retained false memories, including one asserting that the user was a 102-year-old flat earther living in a simulated world akin to The Matrix.

Google's previous safety measures had trained Gemini to resist indirect prompts that modify long-term memory without explicit user approval. However, Rehberger circumvented this barrier by tying memory modifications to conditions that users would naturally fulfil.

Google told Ars Technica: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarising a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has a limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher contacting us and reporting this issue."

Rehberger acknowledged that Gemini alerts users to new long-term memory entries, allowing them to detect and remove unauthorised modifications.

However, he questioned Google's risk assessment: "Memory corruption in computers is pretty bad, and I think the same applies here to LLM apps. The AI might not show a user certain information, refuse to discuss specific topics, or propagate misinformation. The good thing is that memory updates don’t happen entirely silently—the user at least sees a message about it (although many might ignore it)."

 

Last modified on 12 February 2025
Rate this item
(0 votes)

Media

Read more about: