AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways

Reference: Deng, Guo, Han, Ma, Xiong, Wen, Xiang (2025). ACM Computing Surveys 57(7), Article 182. Source file: 3716628.pdf. URL

Summary

This survey organizes the emerging threat landscape of LLM-powered AI agents around four knowledge gaps: unpredictability of multi-step user inputs, complexity of internal execution, variability of operational environments, and interactions with untrusted external entities. It unifies single-agent and multi-agent attack surfaces within a perception/brain/action + agent2agent/agent2env/agent2memory taxonomy.

Concrete threats reviewed include adversarial prompts, prompt injection, jailbreaks, backdoor attacks, hallucination and misalignment, tool-use risks, indirect prompt injection, reinforcement-learning environment attacks, cooperative and competitive inter-agent risks, and long/short-term memory attacks. The authors tabulate defenses (prevention- and detection-based), rate their efficacy, and highlight open directions for robust and trustworthy agents.

Key Ideas

  • Four knowledge gaps framing agent security.
  • Taxonomy: perception / brain / action / agent2agent / agent2env / agent2memory threats.
  • Six categories of prompt-injection attack engineering (naive, escape, context-ignore, fake-completion, multimodal, combined).
  • Jailbreak domino effect in multi-agent populations.
  • Memory poisoning and indirect prompt injection as underexplored surfaces.

Connections

Conceptual Contribution

Tags

#security #llm-agents #survey #threat-modeling

Backlinks