ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

Reference: Yihao Zhang, Zeming Wei, Xiaokun Luan, Chengcan Wu, Zhixin Zhang, Jiangrong Wu, Haolin Wu, Huanran Chen, Jun Sun, Meng Sun (2026). arXiv:2603.15727v2 (Peking University; Sun Yat-sen; Wuhan; Tsinghua; SMU). Source file: 2603.15727v2.pdf. URL

Summary

Presents ClawWorm, the first demonstrated self-replicating, worm-style attack on a production-scale autonomous LLM-agent ecosystem. The target is OpenClaw, an open-source personal AI-agent framework with over 40,000 active instances, a persistent Markdown workspace (SOUL.md, AGENTS.md, SKILL.md), tool-execution privileges, and cross-platform messaging (Telegram, Discord, WhatsApp, Slack, Signal, Moltbook). A single crafted message triggers the victim to write a malicious payload into its highest-privilege configuration file, which then auto-fires at every session restart and autonomously propagates to every newly encountered peer — all without further attacker intervention.

The worm implements a dual-anchor persistence mechanism: one anchor injects the payload into the Session Startup section of AGENTS.md (guaranteeing execution on reboot), the other injects a global interaction rule (guaranteeing propagation during routine replies). Three attack vectors are studied (A: web injection, B: skill-supply-chain via ClawHub, C: direct fenced-code replication with word-by-word handshake) and three payloads (P1 recon, P2 resource exhaustion, P3 command-and-control via URL retrieval). Across 1,800 trials on four frontier LLM backends (Minimax-M2.5, DeepSeek-V3.2, GLM-5, Kimi-K2.5) the aggregate attack success rate is 64.5%, with Vector B (skill supply chain) reaching 81% and sustained multi-hop propagation up to 5 hops. An epidemiological projection with basic reproduction number R0 = k × ASR shows inevitable ecosystem-wide saturation even for security-conscious models.

The root cause is identified as the flat context trust model: the LLM cannot distinguish instructions from its owner, the system layer, or an arbitrary channel participant, so architectural patterns (unconditional workspace loading, LLM-mediated tool authorisation, unreviewed skill packages) amount to structural — not idiosyncratic — vulnerabilities shared by any agent ecosystem of similar design.

Key Ideas

  • Single-message, fully autonomous worm against a production agent framework
  • Dual-anchor persistence: Session Startup + global interaction rule
  • Three attack vectors (web URL, skill supply chain, direct instruction replication)
  • Multi-turn autonomous-retry social engineering boosts ASR by up to +24 pp
  • Epidemiological SI model with R0 = k × ASR predicts ecosystem saturation
  • Execution-layer guardrails alone cannot halt propagation (dormant payloads persist)
  • Flat context trust model as structural root cause

Connections

Conceptual Contribution

Tags

#agent-security #prompt-injection #llm-agents #multi-agent #worm #self-replicating

Backlinks