AI Security

Prompt injection

Hidden instructions in ordinary content can steer an AI system. Why no patch removes it, and why access scope decides the damage.

87%

of organizations call AI vulnerabilities the fastest-growing risk

2026

• World Economic Forum

47%

of security leaders saw AI agents act outside intent

2026

• Saviynt

47%

of deployed AI agents are actively monitored or secured

2026

• Gravitee

Instructions hidden in an email or document execute with whatever access the AI reading them holds.

what prompt injection is

Prompt injection is an attack technique in which malicious instructions are embedded in content that an AI system processes. When the model reads that content, it may follow the embedded instructions as if they were issued by the legitimate user.

A simple example: a user asks an AI assistant to summarize a document. The document contains, alongside normal text, a hidden instruction that tells the model to instead reply with a fabricated summary and exfiltrate certain data. The model processes the document and may follow the embedded instruction. The user receives a plausible-looking output without knowing the model's behavior was redirected.

Prompt injection differs from conventional software vulnerabilities. There is no code execution involved. The attack works through the model's instruction-following behavior, which is the same behavior that makes the model useful.

why prompt injection is a structural challenge

No software update patches prompt injection away. Current language models do not reliably distinguish between user instructions and instructions embedded in processed content. That is a fundamental characteristic of how they work.

The risk grows in proportion to two factors. First, how much untrusted content the AI system processes. A system that only processes content the user typed is a lower risk than one that processes emails, web pages, uploaded documents, or data pulled from external sources. Second, what the AI system is allowed to do. A system that generates text is lower risk than one that can send emails, create documents, or interact with other applications.

Agentic AI systems, which act autonomously with a user's credentials, are where prompt injection risk is highest. An agent that processes an attacker-controlled web page or document and has permission to take actions in connected applications is a meaningful attack surface.

what prompt injection can produce in practice

The practical risk from prompt injection depends entirely on what the AI system is authorized to do.

For AI assistants that only generate text, the risk is misleading output. A model processing an attacker-controlled document might produce a summary that serves the attacker's interests rather than the user's. The impact is limited to bad information reaching a human who then acts on it.

For agentic systems with application access, the risk profile changes. An agent with permission to send email could be directed by an injected instruction to send an email on the user's behalf. One with access to a file storage system could be directed to share files. One with calendar access could modify meetings. The action is bounded by the agent's permissions, which in enterprise deployments are often broad.

The attack requires no malware, no credential theft, and no network intrusion. It requires only that the AI system process content the attacker can influence, and that the system have permission to do something consequential.

what works

Prompt injection cannot be fully prevented with current models, so the controls that matter sit in access and architecture rather than in the model itself. Least privilege does the heaviest lifting. An agent that can read and write only a defined subset of data, and take only a defined set of actions, turns a successful injection from an incident into an inconvenience, which is why the permissions of every AI agent deserve comparison against the minimum the task requires. The same arithmetic applies to input: systems that process attacker-influenced content, public web pages, external email, documents uploaded by outside parties, carry far more injection risk than systems confined to internal content, and workflows designed to keep broad-permission agents away from untrusted content remove most of the practical attack surface. The highest-risk combination is always the same one, a system that processes external content and holds broad permissions to act.

Human confirmation on irreversible actions is the control that survives even when the others fail. Sending email to customers, modifying financial records, and deleting data stay behind explicit approval regardless of where the request originated, which caps what any injected instruction can accomplish on its own. Action-level logging makes the difference between investigating an unexpected agent action and guessing about it, and many AI platform deployments ship without it enabled. And because misleading output is itself a payload, AI-generated content bound for outside the organization benefits from a human review step between generation and delivery; that single gate catches the most consequential injection outputs before they cause damage.

practical guides you might find useful

let's start with a conversation

Most first conversations start with not quite knowing what you have or where to begin. That's normal, and it's exactly where we're useful.

Tell us what prompted this. An upcoming audit, an incident, a client's security questionnaire, or just a sense that things have gotten messy.

We'll take it from there

Julian Machowski

Head of Technical Sales
+48 783 762 997
julian@unshadowit.com

Let's connect on LinkedIn

Message received. We'll be in touch soon.

Something failed. Try again or call us directly.

Prompt injection

We audit environments built on

of organizations call AI vulnerabilities the fastest-growing risk

2026

•

World Economic Forum

of security leaders saw AI agents act outside intent

2026

•

Saviynt

of deployed AI agents are actively monitored or secured

2026

•

Gravitee

Instructions hidden in an email or document execute with whatever access the AI reading them holds.

what prompt injection is

why prompt injection is a structural challenge

what prompt injection can produce in practice

what works

practical guides you might find useful

let's start with a conversation