MalTool: Malicious Tool Attacks on LLM Agents

Reference: Hu, Jia, Li, Song, Gong (2026). arXiv:2602.12194 (Duke, UC Berkeley). Source file: 2602.12194v2.pdf. URL

Summary

This paper presents the first systematic study of code-level malicious tool attacks on LLM agent ecosystems (MCP, Skills, mcp.so, skillsmp). Whereas prior work focused on crafting misleading tool names and descriptions, the authors show that genuinely harmful behaviour must be embedded in a tool’s implementation. They propose a CIA (confidentiality/integrity/availability) taxonomy of 12 concrete malicious behaviours (data exfiltration, credential abuse, data poisoning, file deletion, RCE downloading, CPU/GPU hijacking, DoS).

They build MalTool, a coding-LLM framework that iteratively synthesizes standalone and Trojan malicious tools using a behaviour-specific system prompt, diversity guidance, and an execution-based verifier. The result: 1,200 standalone malicious tools and 5,287 real-world tools with injected malicious behaviours. Detection methods (VirusTotal, Cisco MCP Scanner, MCPScan) perform poorly, motivating new defences.

Key Ideas

  • CIA taxonomy of malicious tool behaviours in agent settings.
  • Automatic generation pipeline: system prompt + coding LLM + execution-based verifier.
  • Trojan construction by embedding malicious logic in benign tool code.
  • Existing malware and MCP-specific scanners fail on both false-positives and false-negatives.
  • Dataset released for benign tools only to minimize misuse.

Connections

Conceptual Contribution

Tags

#security #llm-agents #mcp #malicious-tools

Backlinks