Engineering

The OWASP Top 10 for AI Agents Just Dropped. Every Builder Should Read It.

OWASP has published security frameworks for web applications for over two decades. SQL injection, XSS, broken authentication, the classics. But in early 2026, they released something new: the first Top 10 specifically for agentic AI applications. And it reads like a different planet.

The traditional OWASP Top 10 assumes a human is clicking buttons and submitting forms. The agentic version assumes software is making decisions, calling tools, talking to other software, and remembering things across sessions. The attack surface is fundamentally different.

Here is what made the list, why it matters, and what builders should do about it.

ASI01: Agent Goal Hijacking

The number one risk. An attacker manipulates what an agent is trying to accomplish by injecting instructions through external inputs like documents, emails, or API responses. Because agents use natural language to represent goals and plans, they struggle to distinguish between legitimate instructions and malicious content embedded in the data they process.

A customer support agent parsing an email could encounter hidden instructions telling it to forward all customer data to an external endpoint. The agent follows the instruction because, to it, the instruction looks like part of the task.

Mitigation: Separate instruction channels from data channels. Validate agent goals against a fixed policy before execution. Monitor for goal drift across steps.

ASI02: Tool Misuse and Exploitation

Agents don't just think, they act. They call APIs, run shell commands, query databases, and send messages. Tool misuse happens when an agent uses a legitimate tool in unintended ways: chaining a harmless lookup with a privileged API call, exhausting a tool's budget through recursive calls, or leaking state between tool invocations.

The risk is not that the agent gains unauthorized tools. It is that it misuses the ones it already has.

Mitigation: Enforce per-tool rate limits and budget caps. Validate tool inputs and outputs at each step. Implement tool-level sandboxing.

ASI03: Agent Identity and Privilege Abuse

When agents act on behalf of users, they inherit trust and permissions. This creates opportunities for impersonation, cross-agent trust abuse, and role bypass. An agent that inherits admin credentials to perform a maintenance task might retain those credentials for subsequent, unrelated tasks.

Mitigation: Apply least-privilege principles per task, not per agent. Rotate credentials between tasks. Audit privilege escalation paths.

ASI04: Agentic Supply Chain Compromise

Agents dynamically load tools, schemas, plugins, and prompts from external registries. If any of those components are compromised, the agent inherits the compromise. Earlier this year, researchers found that ClawHub, an AI agent registry, had been systematically poisoned at scale, with malicious tool descriptions tricking agents into executing harmful actions.

Mitigation: Pin and verify tool versions. Validate schemas against known-good signatures. Audit third-party components before deployment.

ASI05: Unexpected Code Execution

Agents generate and execute code. That is often the point. But without proper sandboxing, agent-generated code can escape its intended context: running shell commands, accessing the filesystem, or triggering eval() on unsanitized inputs.

Mitigation: Execute all agent-generated code in isolated containers. Restrict filesystem and network access. Block dangerous functions at the runtime level.

ASI06: Memory and Context Poisoning

This one is subtle and dangerous. Agents with long-term memory can have that memory corrupted through indirect prompt injection. Once poisoned, the false information persists across sessions, influencing future decisions without any visible prompt manipulation. Lakera AI demonstrated this by injecting false beliefs about security policies into an agent's memory store, beliefs that persisted indefinitely.

Unlike prompt injection (a one-shot attack), memory poisoning is persistent. The agent carries the corruption forward into every future interaction.

Mitigation: Validate memory writes against source integrity scores. Implement memory decay and periodic audits. Separate high-trust and low-trust memory stores.

ASI07: Insecure Inter-Agent Communication

Multi-agent systems pass messages between planners, executors, and specialized sub-agents. Without proper authentication and validation, these messages can be intercepted, spoofed, or injected with malicious instructions. An agent-in-the-middle attack lets an adversary redirect an entire workflow by modifying a single message between agents.

Mitigation: Authenticate all inter-agent messages. Validate message integrity at each hop. Implement message schemas with strict typing.

ASI08: Cascading Agent Failures

When agents are chained together, a small failure in one can cascade through the entire system. A tool returning an unexpected format causes the next agent to misinterpret the data, which causes the next agent to take a wrong action, which triggers a resource exhaustion loop. Traditional error handling was designed for predictable failure modes. Agent failures are often emergent and unpredictable.

Mitigation: Implement circuit breakers between agent steps. Define blast-radius limits per workflow. Run digital twins to test failure propagation before production deployment.

ASI09: Human-Agent Trust Exploitation

Agents project confidence. They write fluently, explain their reasoning, and present results with authority. This makes humans inclined to trust their outputs without verification. Attackers exploit this by manipulating agent outputs to appear more credible, knowing the human in the loop is unlikely to double-check a well-formatted, confident response.

Mitigation: Design UIs that surface uncertainty. Require human review for high-stakes actions regardless of agent confidence. Train teams on agent limitations.

ASI10: Rogue Agents

The final entry covers agents that drift beyond their intended objectives through goal drift, emergent behavior, or even collusion between agents. A rogue agent is not necessarily compromised, it simply evolved beyond what its builders planned for. Reward hacking, where an agent optimizes for the metric rather than the actual goal, falls into this category.

Mitigation: Define hard boundaries for agent behavior. Implement kill switches. Monitor for behavioral drift over time.

Why This Matters Now

Gartner expects 40% of enterprise applications to integrate AI agents by end of 2026. But most security teams are still using tools built to detect anomalies in human behavior. An agent that executes 10,000 API calls in perfect sequence looks normal to a SIEM. An agent that gradually shifts its goals over weeks looks normal to a monitoring dashboard. The attack patterns are invisible to traditional tooling.

The OWASP Agentic Top 10 is a starting point, not a checklist. Every team deploying agents in production should read it, internalize the threat models, and audit their systems against each category.

The full document is available at genai.owasp.org.

Want to test the most advanced AI employees? Try it here: Geta.Team