Jailbreak

An attack that circumvents an LLM’s safety alignment to produce disallowed outputs; in agent settings, a jailbroken component can subvert downstream tools and peers.

In this vault

Backlinks