Published in AI

AI groups race to plug prompt injection bugs

by on03 November 2025


Industry teams try to stop criminals tricking chatbots into spilling secrets

Big language AI models are under a sustained assault and the tech world is scrambling to patch the holes.

Anthropic, OpenAI, Google DeepMind and Microsoft are among the groups racing to stop so-called indirect prompt injection, where attackers hide commands in websites or emails to trick models into revealing confidential information.

Anthropic, threat intelligence lead, Jacob Klein said: "AI is being used by cyber actors at every chain of the attack right now."

He warned that criminals are already folding large language models into every stage of an intrusion.

The companies use a mix of techniques to defend their models, from hiring external testers to building AI hunting tools that flag suspicious inputs. Anthropic said it works with outside testers and has automated systems that can escalate cases for human review.

Anthropic, threat intelligence lead, Jacob Klein said, "When we find a malicious use, depending on confidence levels, we may automatically trigger some intervention or it may send it to human review."

Google DeepMind runs automated red teaming, where researchers repeatedly attack Gemini in realistic scenarios to find weaknesses before criminals do. The UK’s National Cyber Security Centre warned in May that these flaws increase the risk of sophisticated phishing and scams affecting millions of people and businesses.

Researchers have highlighted data poisoning, where attackers seed training data with malicious material to open back doors in models. New research from Anthropic with the UK’s AI Security Institute and the Alan Turing Institute found such attacks are easier than previously thought.

Security teams are hopeful that AI can help even as it fuels crime.

Microsoft, corporate vice-president and deputy chief information security officer, Ann Johnson said: "Defensive tools are getting smarter and moving from reactive to proactive, "Defensive systems are learning faster, adapting faster, and moving from reactive to proactive."

At the same time criminals are using generative AI to write malicious code and scale operations, making detection harder.

ESET, global cyber security adviser, Jake Moore said that models can rapidly generate novel malware and take advantage of gaps in existing defences.

Visa, chief risk and client services officer, Paul Fabara said models can trawl public posts and audio to craft convincing scams.

Pindrop, chief executive and co-founder, Vijay Balasubramaniyan said: "In 2023, we’d see one deepfake attack per month across the entire customer base. Now we’re seeing seven per day per customer."

The industry is throwing people and tools at the problem but experts say a definitive fix for indirect prompt injection remains elusive and the arms race is only intensifying.

Last modified on 03 November 2025
Rate this item
(0 votes)

Read more about: