What Is Indirect Prompt Injection, and Why Is It Dangerous?

Artificial intelligence systems are increasingly being asked to do more than answer questions. Today’s large language models (LLMs) summarize webpages, review content, analyze documents, assist with purchasing decisions, support customer service, and increasingly act as “agents” that can take actions on a user’s behalf. That shift creates a new security problem: indirect prompt injection.

Indirect prompt injection happens when an attacker hides instructions inside content that an AI system later reads and treats as if those instructions were legitimate. Instead of attacking the model directly through a chat box, the attacker places the malicious prompt in a webpage, document, comment field, metadata, or script. When the AI processes that content during a normal task, it may follow the attacker’s hidden instructions without realizing the source is untrusted.

This matters because modern AI systems often consume huge amounts of external content automatically. A browser assistant may summarize a page. A security tool may inspect web content. An ad-review system may evaluate whether a landing page is safe. A research assistant may extract facts from multiple websites. In all of those cases, the AI is reading data from outside the organization. If that data contains hidden instructions, the model may confuse data with commands. That is the core weakness behind indirect prompt injection.

How indirect prompt injection differs from direct prompt injection

Direct prompt injection is easier to understand: an attacker types something like “ignore previous instructions” directly into the model’s input. Indirect prompt injection is more subtle. The attacker does not talk to the AI system directly. Instead, they poison the content that the AI later reads as part of its routine workflow. The AI becomes compromised through its environment rather than through a direct conversation.

That difference is important because indirect prompt injection can scale far beyond one chat session. A single malicious webpage can influence many users, many AI-enabled tools, or multiple downstream systems if those systems all fetch and interpret the same content. In agentic environments, where the AI can browse, click, purchase, summarize, or trigger workflows, the risk becomes much larger. The threat model is simple but serious: an attacker embeds malicious instructions in a website, a user or tool accesses that site, an AI agent processes it, and the agent may then perform an unintended action.

Why the threat is real

For a long time, prompt injection was discussed mainly as a theoretical problem or as proof-of-concept research. What makes this topic more urgent is that recent security research has described real-world detections of web-based indirect prompt injection, not just lab experiments. Attackers have already used these techniques for ad-review evasion, SEO manipulation, attempts at unauthorized transactions, data destruction, denial of service, sensitive information leakage, and system prompt leakage.

One of the most striking examples is an apparent attempt to bypass an AI-based ad review system. In that case, attackers embedded hidden instructions in a scam page and tried to coerce an AI reviewer into approving content it would normally reject. The malicious text effectively tried to override the system’s intended rules and produce an “approved” decision. Even if this kind of attack does not always succeed, the existence of such attempts shows that attackers already see AI reviewers and AI agents as worthwhile targets.

How attackers hide the instructions

What makes indirect prompt injection dangerous is not only the instruction itself, but the many ways it can be concealed.

Attackers may hide prompts visually by setting the font size to zero, placing text off-screen, making it transparent, or using CSS rules such as display:none and visibility:hidden. Others conceal malicious instructions inside HTML attributes, SVG or XML structures, or inject them dynamically through JavaScript after a page loads. In some cases, attackers even split prompts across multiple elements so the page looks harmless to a human reviewer, while the AI reconstructs the combined text when it reads the page.

Attackers also use “jailbreak” techniques to make the model more likely to comply. These include invisible Unicode characters, homoglyph substitution, Base64 or HTML entity encoding, multilingual instructions, JSON or syntax injection, and especially social engineering. Many attacks do not rely on deep technical sophistication alone; they also rely on persuading the model that the malicious instruction is urgent, authorized, or part of a legitimate compliance task.

Why this is dangerous for both professionals and consumers

For cybersecurity professionals, indirect prompt injection is dangerous because it targets a blind spot in many AI systems: the inability to reliably separate trusted instructions from untrusted content. If an AI-enabled workflow has access to business systems, customer data, moderation tools, or transaction capabilities, a successful injection can influence decisions, leak information, or trigger harmful actions. The risk ranges from nuisance behavior to critical incidents involving data destruction, denial of service, or sensitive information leakage.

For consumers, the danger is more practical but just as serious. If an AI assistant recommends websites, summarizes offers, or helps complete transactions, a poisoned page could push the assistant toward a scam, a fraudulent purchase, or a malicious payment link. Researchers have documented examples of pages attempting to trigger paid subscriptions, donations, shoe purchases, and money transfers. These examples show how an AI system can become a stepping stone between a malicious website and a user’s wallet.

There is also a trust problem. Users tend to assume that an AI assistant is neutral when it summarizes or recommends content. Indirect prompt injection exploits that trust. The model may appear helpful while quietly reflecting instructions planted by an attacker.

What the trends suggest

Security telemetry suggests that some of the most common prompt delivery methods include visible plaintext, HTML attribute cloaking, and CSS-based rendering suppression. On the intent side, observed campaigns have included irrelevant output generation, data destruction, and content moderation bypass. These trends suggest that the threat landscape includes both experimentation and serious abuse. Some attackers are testing what works; others are already aiming for financial gain or operational damage.

Why defending against it is hard

Traditional security controls often assume that instructions and content are clearly separated. LLMs do not naturally work that way. They process everything in a shared context window, which means a cleverly written webpage can blur the line between content to analyze and instructions to obey. Defenses such as spotlighting, instruction hierarchy, adversarial training, and design-level safeguards can help reduce the risk, but organizations still need defense in depth and better detection at scale.

In simple terms, indirect prompt injection is dangerous because it turns ordinary content into a control channel for attacking AI. As AI agents become more capable, this risk grows. The web is no longer just a place AI reads. It is becoming a place where attackers can try to program the AI indirectly.

Source: This article is based primarily on Fooling AI Agents: Web-Based Indirect Prompt Injection Observed in the Wild by Palo Alto Networks Unit 42.

FAQ

1. What is the simplest definition of indirect prompt injection?
It is a technique where malicious instructions are hidden inside content an AI later reads, causing the AI to treat untrusted content as commands.

2. Why is it more dangerous in AI agents than in basic chatbots?
Because agents can often take actions, access tools, browse websites, or interact with business workflows. That gives a successful injection a much larger real-world impact.

3. Can hidden text really affect an AI even if a human cannot see it?
Yes. Hidden prompts can be placed in zero-size text, off-screen elements, concealed CSS blocks, encoded payloads, and JavaScript-generated content that AI systems or related pipelines may still process.

4. Is this only a future risk, or is it already happening?
It is already happening. Security researchers have documented real-world cases involving SEO manipulation, unauthorized transaction attempts, information leakage, and moderation bypass.