Agent Libel

Failure mode where an LLM agent produces defamatory or false claims about third parties (other agents, users) during autonomous operation.

In this vault

Backlinks