Backdoor Attacks

Adversarial attacks in which an agent or model is trained or modified to behave normally except when a specific trigger is present, at which point it executes attacker-chosen behaviour. A significant threat vector for LLM-based agents.

In this vault

Backlinks