Expand ↗
Page list (942)

Backdoor Attacks

Adversarial attacks in which an agent or model is trained or modified to behave normally except when a specific trigger is present, at which point it executes attacker-chosen behaviour. A significant threat vector for LLM-based agents.

In this vault

Last changed by zetl · stable 5d · history

Backlinks