Regret-Loss

Park et al. 2024 unsupervised training objective for LLM agents: minimises a no-regret-shaped loss over historical play and payoffs, without requiring labels of optimal actions. See Do LLM Agents Have Regret.

In this vault

Backlinks

Linked Pages

index

Agent Communications Vault

A curated, wikilink-connected reading vault on agent communication languages, multi-agent systems, capability security, distributed systems, and LLM agents — from McCarthy and Minsky through KQML/FIPA to modern LLM agent protocols.

Each note summarises a paper in its own words (summary, key ideas, conceptual contribution, connections) and is cross-linked to related concepts and papers, forming a navigable graph of the field.

Start with concept-map for a guided tour, or browse the map of content below.

How to contribute

The vault is a plain-text zetl wikilink graph — every note is a markdown file with [[wikilinks]]. Contributions welcome:

Clone: git clone https://github.com/anuna-cooperative/agent-comms-wiki.git
Add or edit notes as plain markdown. New paper notes should follow the structure of existing ones (Reference, Summary, Key Ideas, Connections, Conceptual Contribution, Tags).
Run zetl check to validate links, and zetl build to preview the site locally.
Open a pull request at https://github.com/anuna-cooperative/agent-comms-wiki.

See README for detailed conventions.

Map of Content

Concept Hubs

Foundational

concept-map

Conceptual Map

A guided conceptual tour through the vault. Where index lists the papers, this page lists the ideas and shows how they interlock. Every paper note now also carries a ## Conceptual Contribution section (claim / mechanism / concepts / stance / relates-to).

1. The Central Tension: What Does a Message Mean?

Agent communication’s perennial question — whose mental states does a message commit? — runs the length of this vault.

Speech Act Theory (Austin → Searle → Foundations Of Illocutionary Logic) fixes a vocabulary: illocutionary force, direction of fit, sincerity and preparatory conditions. Every ACL after this inherits it.
Mentalistic Semantics — grounding message meaning in the beliefs/intentions of sender and receiver. KQML (KQML Overview, KQML Language And Protocol, KQML as an Agent Communication Language) and FIPA-ACL adopt it.
Commitment-based Semantics / Public Semantics — the counter-move. Singh’s critique (ACL Rethinking Principles, Agent Communication Languages - Rethinking the Principles) argues mentalistic semantics is unverifiable: we cannot inspect another agent’s mind, only its public commitments. Agent Communication And Institutional Reality pushes further: every message is a declaration that alters social commitments; Searle’s “counts-as” is the operative logic.
Verifiable Semantics — Verifiable Semantics for ACLs formalises the critique by requiring grounding in program state so conformance is model-checkable. A Common Ontology Of ACLs offers a reconciliation: role-instanced public attitudes unify the two families.
Conversation Policy / Interaction Protocols — even with messages nailed down, coordination needs conversations. Coordinating Agents Using ACL Conversations (Colored Petri Nets), ACRE Agent Conversation Reasoning Engine (Dooley graphs), and An Interaction-oriented Agent Framework for Open Environments (commitment-based protocols) make the conversation first-class.

Surveys mapping this debate: The State of the Art in Agent Communication Languages, Trends in Agent Communication Language.

2. The Language Stack

Messages compose into languages compose into protocols.

Layer	Concept	Representative papers
Content	KIF, ontology term sets	KQML Overview, Ontolingua Portable Ontology Specifications, Handbook On Ontologies
Message	Performatives / illocutions	KQML, FIPA-ACL, Foundations Of Illocutionary Logic
Conversation	Interaction Protocols	Coordinating Agents Using ACL Conversations, ACRE Agent Conversation Reasoning Engine
Transport	Facilitators, routing	KQML Language And Protocol, Model Context Protocol, Agent-to-Agent Protocol

This same stack — content / message / conversation / transport — reappears in the modern LLM-agent protocol wave: see Survey Of AI Agent Protocols and Survey Of Agent Interoperability Protocols, which place Model Context Protocol (tools), ACP, Agent-to-Agent Protocol, and Agent Network Protocol at progressively higher layers.

3. How Does Shared Language Arise?

A separate tradition asks where meaning comes from rather than what it contains.

Linguistic foundations. Three Models for the Description of Language establishes what structure a shared code must have (Chomsky hierarchy, transformational grammar). Algorithmic Information Theory - Grunwald Vitanyi provides the information-theoretic counterpart: meaning is compressed description.
Language Games. Language Games for Autonomous Robots (Steels) shows grounded lexicons self-assemble through situated interaction — no designer required. The same bootstrap appears decision-theoretically in Towards Automating the Evolution of Linguistic Competence and Toward Automated Evolution of ACLs: rational agents negotiate vocabulary when current language fails.
Emergent Communication. The deep-learning revival: Multi-Agent Cooperation and the Emergence of Natural Language, Emergence of Grounded Compositional Language in Multi-Agent Populations — neural agents in referential/signalling games evolve compositional codes. On the Pitfalls of Measuring Emergent Communication is the sharpest critique: most metrics fail to distinguish real communication from confounds; measure positive signalling and positive listening with causal interventions.
Common Business Communication Language is an analogue in the pre-ML era — an open-ended language negotiable between organisations with graceful partial-understanding fallback.
The LLM inflection point. Why AI Agents Communicate In Human Language argues natural language is exactly the wrong inter-agent medium: lossy, non-differentiable, ambiguous. The thread rejoins the ACL debate a quarter-century later.

Do LLM Agents Have Regret

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Reference: Park, Liu, Ozdaglar & Zhang (2024). Do LLM Agents Have Regret? A Case Study in Online Learning and Games. arXiv:2403.16843 (MIT; UMD). URL. OpenReview: https://openreview.net/forum?id=OhZ4u164cN.

Summary

Park et al. ask a sharp question about LLM agents in interactive settings: do they have regret? — i.e. do they exhibit the no-regret behaviour that classical online-learning and game-theoretic algorithms guarantee, and that is necessary for converging to coarse-correlated equilibria in repeated games?

The paper proceeds in three steps. Empirically, they evaluate GPT-3.5 / GPT-4 / Claude / Llama on canonical online-learning benchmarks (prediction-with-expert-advice; bandit-like sequential decision problems) and on repeated games (matrix games, Cournot, Bertrand, public-goods). Frontier LLMs are often no-regret across these settings and often converge to coarse-correlated or Nash equilibria when playing each other. Theoretically, they offer a partial explanation: under stylised assumptions on supervised pre-training and human rationality, the LLM’s next-action distribution approximates a softmax over historical payoffs — which itself implements a no-regret algorithm. But they identify clean failure cases: there exist simple non-stationary or adversarial online-learning instances where even GPT-4 demonstrably accumulates linear regret.

The paper’s constructive contribution is a new regret-loss training objective. Unlike supervised pretraining loss, regret-loss does not require labels of optimal actions — only the historical sequence of plays and payoffs. The authors prove a statistical generalisation bound for regret-loss minimisation and an optimisation guarantee that minimising it can recover known no-regret learning algorithms (e.g. FTRL). Empirically, regret-loss-finetuned models close the gap on the failure cases. The paper is a foundational reference for any analysis of LLM agents in markets, auctions, or interactive coordination — a category that includes Virtual Agent Economies, Mechanism Design for Large Language Models, Learning Collusion in Episodic Inventory-Constrained Markets, and Language Models Can Reduce Asymmetry in Information Markets.

Key Ideas

Regret as a diagnostic for LLM agents in interactive settings: do they no-regret-learn against arbitrary opponents?
Empirical screen: frontier LLMs (GPT-3.5/4, Claude, Llama) on canonical online-learning + repeated-game benchmarks.
Often no-regret in benign settings, often converging to coarse-correlated / Nash equilibria when playing each other.
Theoretical bridge: under stylised pretraining + human-rationality assumptions, the LLM’s next-action distribution resembles a softmax over payoffs — itself a no-regret algorithm.
Identified failure cases: simple non-stationary / adversarial online-learning instances where GPT-4 has linear regret.
Regret-loss objective: label-free training loss that explicitly incentivises no-regret behaviour; statistical and optimisation guarantees.
Recovery of classical algorithms: minimising regret-loss can converge to algorithms like FTRL.

Connections

Conceptual Contribution

Claim: Whether LLM agents exhibit no-regret behaviour in interactive settings is the right diagnostic for whether they can be deployed in markets, auctions, and coordination protocols. Frontier LLMs are often but not always no-regret; specific failure cases can be fixed by an explicit regret-minimising training objective.
Mechanism: Empirical benchmark of LLMs on online learning + repeated games (regret + equilibrium convergence); theoretical link from supervised pretraining to softmax-over-payoffs (a no-regret update); construction of a label-free regret-loss with generalisation + optimisation guarantees; recovery of FTRL-like algorithms as the loss is minimised.
Concepts introduced/used: No-Regret Learning, Regret-Loss, Online Learning, Repeated Game, Coarse-Correlated Equilibrium, FTRL, LLM Agents
Stance: empirical + theoretical
Relates to: Cousin work to Cicero Human-Level Play in Diplomacy in the “LLMs as game-theoretic agents” thread. Provides the analytical foundation for the systemic-risk claims in Virtual Agent Economies and the collusion experiments in Learning Collusion in Episodic Inventory-Constrained Markets; mechanism-design implications for Mechanism Design for Large Language Models; the trust assumption behind Language Models Can Reduce Asymmetry in Information Markets depends on agents being approximately no-regret.