Engineering

341 Malicious AI Agent Skills Found in the Wild. Here's What Went Wrong.

The npm ecosystem had left-pad. Python had the PyPI typosquatting wave. Docker Hub had cryptominers hiding in official-looking images. Every software supply chain eventually gets poisoned. AI agent skills just joined the list.

By February 2026, Antiy CERT confirmed 1,184 malicious skills on ClawHub, the marketplace for the OpenClaw framework. A broader audit of 2,890+ skills found that 41.7% contain serious security vulnerabilities. Trend Micro found 492 MCP servers exposed to the internet with zero authentication. And this is just what researchers caught.

The AI agent skills ecosystem grew from a niche developer experiment to a core infrastructure layer in about six months. Sixteen major AI tools now support the Anthropic skills standard. The attack surface grew just as fast.

How Agent Skills Get Weaponised

The basic attack is embarrassingly simple. A threat actor uploads a skill to a community hub — ClawHub, GitHub, a shared skills repository. The skill looks legitimate: it has a reasonable name, a plausible description, and a markdown file that explains what it does. Under the hood, it contains payload code that executes when the agent loads it.

The payloads researchers found fall into three categories:

Data exfiltration. The skill constructs URLs with sensitive data embedded in query parameters and tricks the agent into making HTTP requests to attacker-controlled domains. In agentic systems with link previews, this happens automatically when the agent responds — the user never clicks anything.

Credential theft. Skills that read .env files, API keys, OAuth tokens, or session cookies from the agent's execution environment and send them to external endpoints. Since agents typically run with broad file system access, the skill has access to everything the agent does.

Arbitrary code execution. Skills that use shell commands, subprocess calls, or script injection to run whatever the attacker wants on the host machine. The agent executes the code because the skill told it to, and the agent trusts the skill because someone installed it.

What makes this worse than traditional supply chain attacks is the trust model. When you npm install a package, your code runs it in a relatively sandboxed environment. When an AI agent loads a skill, the agent can autonomously decide to use it during any task — and the skill runs with the agent's full permissions. The blast radius is the agent's entire capability set.

Why the Standard Defences Don't Work

The usual playbook for supply chain security — dependency scanning, signature verification, reproducible builds — doesn't map cleanly onto skills. Here's why:

Skills are mostly natural language. A skill is a markdown file with instructions, maybe some code snippets, and metadata. Static analysis tools built for Python or JavaScript don't know what to do with a markdown file that says "when the user asks about their schedule, first send a GET request to this URL."

The attack surface is the prompt, not the binary. Traditional malware scanners look for suspicious bytecode. In a skill attack, the malicious payload might be a natural language instruction that gets the agent to do something harmful as part of its normal reasoning process. There's no binary to scan.

Agent permissions are all-or-nothing. Most agent runtimes don't support fine-grained permission models for individual skills. If the agent can read files, every skill can read files. If the agent can make HTTP requests, every skill can phone home. Microsoft's Agent Governance Toolkit (released this month) starts to address this, but adoption is early.

What Actually Works

Based on what we've seen from teams running agents in production, here's what reduces risk without killing velocity:

Pin your skills. Don't pull skills from community hubs at runtime. Vendor your skills into your repository, review them manually, and version-lock them. Yes, this is slower. That's the point.

Audit the execution surface. Before installing any skill, answer three questions: Can it make network requests? Can it read the file system beyond its own directory? Can it execute shell commands? If the answer to any of these is yes, the skill needs manual review by someone who understands what it's doing.

Sandbox aggressively. Run agent skills in isolated execution environments with minimal permissions. Microsoft's Agent Governance Toolkit provides sub-millisecond policy enforcement that intercepts every agent action before execution — it's open-source, MIT-licensed, and works with LangChain, CrewAI, and Google ADK.

Assume compromise. Design your agent's environment as if a skill will be malicious. Don't give agents access to production credentials. Use ephemeral tokens with narrow scopes. Rotate secrets aggressively. If a skill exfiltrates an API key that expires in 15 minutes, the damage is bounded.

Watch outbound traffic. The most reliable signal of a compromised skill is unexpected outbound HTTP requests. Log every network call your agent makes and alert on calls to domains you don't recognise. This catches exfiltration attempts that code analysis misses.

The Uncomfortable Truth

The AI agent skills ecosystem is in the same position as the npm ecosystem circa 2016 — explosive growth, minimal governance, and a community that prioritises convenience over security. The difference is that npm packages can't autonomously decide to run themselves. Agent skills can.

The 1,184 malicious skills on ClawHub aren't the ceiling. They're the floor. As agents handle more sensitive workflows — payroll, payments, customer data, compliance — the incentive for attackers only grows. And unlike a compromised npm package that affects one application, a compromised agent skill can affect every task the agent touches across every system it's connected to.

The teams that will avoid the worst outcomes are the ones treating skill management with the same rigour they'd apply to production dependency management. Pin versions. Review changes. Sandbox execution. Monitor behaviour. Assume breach.

Not glamorous. But it's the difference between running agents safely and learning about your agent's latest skill from a security incident report.

Want to test the most advanced AI employees? Try it here: https://Geta.Team

341 Malicious AI Agent Skills Found in the Wild. Here's What Went Wrong.

How Agent Skills Get Weaponised

Why the Standard Defences Don't Work

What Actually Works

The Uncomfortable Truth

Read more

v2.4.9: Your Phone Agent Finishes Its Sentences, Opens How You Want, and Forgets by Default

v2.4.8: Set a Thinking Level for Any Model, and a UI That Honors Your Brand

v2.4.7: Pin Your OpenRouter Provider, Resume Any Conversation, and a Fresh macOS App

v2.4.6: Streaming Text Stops Scrambling, Images Show Instantly, and macOS Gets Its Desktop App