Prompt injection is a category of attacks where untrusted text steers a model into doing something you didn’t intend. The key detail: it’s not “the model got tricked” — it’s “you treated untrusted text as instructions.”
This matters any time you feed external content into a model: webpages, emails, documents, chat logs, tickets, or user input. If the model can call tools or take actions, injection becomes a real security problem rather than a quality issue.
Where injection shows up in practice
- Summarizing or “reading” webpages where the page content can contain adversarial instructions.
- Email/helpdesk triage where senders can embed instructions to exfiltrate or misroute data.
- Agent-style workflows that call tools based on model output.
A simple taxonomy: data, decisions, and actions
- Data: untrusted text you ingest (the attacker controls this).
- Decisions: model outputs you might trust (summaries, classifications, suggested tool calls).
- Actions: anything with side effects (sending email, writing to a database, publishing content, triggering payments).
Treat models as an untrusted component when actions are involved. Models are not a security boundary. Put the boundary around them: validation, policy checks, and human confirmation.
Mitigations that move the needle
- Separate data from instructions: keep system rules fixed; treat external text as data only.
- Constrain tool calls: allowlist tools + validate every argument server-side.
- Don’t give models raw secrets: design so the model never sees API keys or private tokens.
- Log and review: you need observability to detect abuse patterns.
Concrete examples (what “validation” means)
- If a tool call includes a URL, validate scheme/host/length and block internal IP ranges.
- If a tool call includes an email address, restrict to approved domains or require confirmation.
- If a tool call writes data, enforce schema and reject unexpected fields.
- If a tool call publishes content, require a review step and record an audit trail.
If an output triggers an action (sending mail, changing data, publishing content), add a human checkpoint or a strict validation layer. Models are excellent assistants; they are not a security boundary.
- External text is tagged and treated as untrusted input.
- Tool calls are allowlisted and validated.
- No secrets are exposed to the model context.
- Actions require explicit user confirmation or policy checks.
How to talk about this internally (so teams take it seriously)
Prompt injection often gets dismissed as “LLMs being weird.” In enterprise environments, frame it as a standard input-validation problem: untrusted input influencing control flow. That language makes it actionable for security and engineering teams.