Rehberger's attack exploits Gemini Advanced, the premium subscription version of the chatbot, through the following process:
A user uploads a document and requests Gemini to summarise it. The document contains hidden instructions that alter the summarisation process.
The generated summary covertly includes a request to store specific data if the user responds with designated trigger words (e.g., "yes," "sure," or "no").
Gemini inadvertently saves the attacker's chosen information to long-term memory if the user replies with a trigger word.
A demonstration video revealed that Gemini retained false memories, including one asserting that the user was a 102-year-old flat earther living in a simulated world akin to The Matrix.
Google's previous safety measures had trained Gemini to resist indirect prompts that modify long-term memory without explicit user approval. However, Rehberger circumvented this barrier by tying memory modifications to conditions that users would naturally fulfil.
Google told Ars Technica: "In this instance, the probability was low because it relied on phishing or otherwise tricking the user into summarising a malicious document and then invoking the material injected by the attacker. The impact was low because the Gemini memory functionality has a limited impact on a user session. As this was not a scalable, specific vector of abuse, we ended up at Low/Low. As always, we appreciate the researcher contacting us and reporting this issue."
Rehberger acknowledged that Gemini alerts users to new long-term memory entries, allowing them to detect and remove unauthorised modifications.
However, he questioned Google's risk assessment: "Memory corruption in computers is pretty bad, and I think the same applies here to LLM apps. The AI might not show a user certain information, refuse to discuss specific topics, or propagate misinformation. The good thing is that memory updates don’t happen entirely silently—the user at least sees a message about it (although many might ignore it)."