CBCL: Safe Self-Extending Agent Communication
Reference: O’Connor, H. (2026). LangSec Workshop, IEEE Security and Privacy Workshops (SPW 2026). arXiv:2604.14512v1 [cs.CR], 16 Apr 2026. Source file: oconnor-cbcl-langsec26.pdf. URL. Reference implementation: https://codeberg.org/anuna/cbcl-rs.
Summary
Building on McCarthy’s 1975/1982 Common Business Communication Language vision and the language-theoretic security programme of Sassaman, Patterson, Bratus and Locasto (Security Applications Of Formal Language Theory), this paper presents a contemporary CBCL: an agent communication language designed so that all messages — including runtime language extensions — remain within the deterministic context-free (DCFL) complexity class. The motivating diagnosis is that contemporary agent communication has slid up the Chomsky hierarchy without acknowledging the security cost: KQML and FIPA-ACL are CFL with informal extensibility, but MCP and LLM-based agent frameworks operate over inputs whose effective complexity is recursively-enumerable, where the validity of an arbitrary message is undecidable and “weird machines” (Exploit Programming - From Buffer Overflows To Weird Machines) become unavoidable. CBCL takes the opposite tack: minimal core (8 performatives — tell, ask, reply, hello, bye, ok, error, cancel) plus a homoiconic dialect mechanism that lets agents define, transmit, and adopt domain-specific vocabularies as first-class CBCL messages, with three machine-checked safety invariants — R1 (no recursion in dialect templates), R2 (declared resource bounds on depth, expansion size, verification time), R3 (core-vocabulary preservation) — that together guarantee DCFL preservation under arbitrary finite sequences of dialect installations.
The paper accompanies the design with a Lean 4 formalization (≈5,400 lines, 16 modules, 176 machine-checked theorems, zero sorry gaps, no custom axioms) covering parser correctness (soundness + completeness), R1–R3 verification, pipeline totality, and the central dcfl_preserved closure theorem; a verified parser binary cbcl-parse is extracted from the proofs. A separately-developed Rust reference implementation (cbcl-rs, ≈11,000 lines, no_std-compatible core) ships a hand-rolled O(n) parser, dialect verification engine, gossip-based dialect propagation (O(log n) convergence per Demers et al.), C FFI/WASM bindings, property-based and differential tests, and cargo-fuzz targets; differential tests cross-validate the Rust parser against the Lean-extracted binary, eliminating spec-implementation parser differentials. Five example dialects (precision agriculture, AI planning, cross-chain asset transfer, artifact management, email) are R1–R3-verified end-to-end. A draft IETF Internet-Draft specifying application/cbcl and a Nostr transport binding (cbcl-nostr) accompany the paper. The Discussion explicitly aligns the design with Singh’s social-agency programme: dialect installation is treated as a public commitment to the dialect’s declared semantics — semantic divergence becomes a breach of commitment, observable and accountable, rather than a hidden failure of shared mental state — sidestepping the unverifiable mentalistic semantics critique of Verifiable Semantics for ACLs.
Implementation status (post-paper, cbcl-rs). The reference implementation has shipped two further safety invariants the paper sketches as future work, plus a richer Lean formalisation: R4 (Integrity) — Ed25519-style signatures over canonical encodings via an algorithm-agnostic Signer trait, with three-valued install verdicts (Valid / Unsigned / Invalid) — and R5 (Contract Well-formedness) — optional (protocol …) and (shape …) clauses on a dialect, expressing causal Merkle DAG protocols and per-performative shape contracts; R5 enforces acyclicity, reachability from begin, performative definedness with ancestor closure for extends, step uniqueness, and depth-bound respect, all decidable in linear fuel at install time. The §VII-D structural-contracts vision is therefore live (SPEC-002, SPEC-003): verify_monotone, verify_all_is_meet, and verify_eventually_consistent are proved theorems showing causal-protocol verification is a lattice homomorphism that joins coordination-free under G-Set store merge — explicit CALM Theorem alignment. The Lean codebase has grown to 380+ declarations across 19 files (still zero sorry, only the standard axioms propext / Classical.choice / Quot.sound); the workspace ships 837 tests including 584 cbcl-core unit tests, 37 proptest cases, 21 differential integration tests against the Lean-extracted binary, libFuzzer harnesses, and cargo-mutants mutation testing at a 90% kill threshold. The crate layout has grown from the paper’s five to seven: the original cbcl-core / cbcl-parser / cbcl-cli / cbcl-wasm / cbcl-ffi plus cbcl-erl (Erlang/LFE binding, SPEC-009 — with SPEC-010 binding-conformance suite) and cbcl-arena (Agent Arena platform, SPEC-012 — composable referee on cbcl-lfe-router with capability-keyed message routing and signed receipt logs). A worked sealed-bid auction dialect (SPEC-004) demonstrates structural defence against false-claim manipulation — concrete evidence that the R5 contract layer is rich enough to encode mechanism-design integrity properties decidably. The earlier Scheme prototype (anuna-research/cbcl) is superseded; architecture decisions live as ADRs under .hence/ (purity boundary, parser library choice, error handling, WASM API surface, R4 adversarial review).
Key Ideas
- DCFL as the Goldilocks complexity class: Regular is too weak for nested envelopes/dialects; general CFL admits ambiguous grammars and parser-differential attacks; context-sensitive is PSPACE-complete; Type-0 is undecidable. DCFL is the minimal class that supports nested structure with class-level unambiguity (parser equivalence under language evolution).
- Homoiconic self-extension: Dialect definitions are themselves valid CBCL messages, parsed by the same recogniser used for ordinary communication — a single attack surface, no separate meta-language. Inspired by Racket’s
#langmechanism. - Three safety invariants R1–R3: declarative pattern-template substitution only (no recursion, no iteration, no reflection); declared resource bounds (depth ≤ 64, expansion ≤ 8192 chars, verification ≤ 1000 ms); the eight core performatives cannot be redefined.
- DCFL preservation theorem: dialect invocations must appear inside a
(lang name …)wrapper; thelangtag plus a unique dialect name forms a prefix-free partition that lets a deterministic pushdown automaton dispatch to the appropriate subgrammar without extra lookahead — circumventing the fact that DCFLs are not closed under union in general. Mechanised in Lean 4 asdcfl_preserved. - Lean 4 formalization with extraction: 176 theorems, 0
sorrygaps, no custom axioms; the verified parser binarycbcl-parseis extracted from the proofs and runs the same fuel-bounded recursive-descent parser as the model. - Rust reference implementation:
cbcl-core(no_std, nounsafe),cbcl-parser,cbcl-cli,cbcl-ffi,cbcl-wasm. Differential tests against the Lean-extracted binary;cargo-fuzzover five entry points. - Eight core performatives (
tell,ask,reply,hello,bye,ok,error,cancel) plus four message categories (simple, meta, lang-scoped, wrapped); keyword parameters in Common-Lisp style (:thread,:in-reply-to). - Single-pass template expansion: terminates in time linear in the template plus argument size; not Turing-complete; output is not fed back into the expander, ruling out unbounded re-expansion.
- Epidemic dialect propagation: gossip-based with Demers et al.’s
O(log n)convergence bound; agents independently verify R1–R3 and signatures before installing. - Canonical serialization per RFC 9804 for cryptographic operations — deterministic byte representation as a precondition for signature verification.
- Dialect identity by content hash: name collisions resolved by canonical-form hash; downgrade attacks rejected at installation.
- Self-expression vs self-adaptation (Zambonelli et al.): traditional ACLs support only parameter tuning within fixed structure; CBCL is a self-expressive protocol whose vocabulary structurally evolves under safety constraints.
- Singh-aligned reading of dialect installation: a public commitment to declared semantics, with
:examplesclauses providing machine-checkable behavioural anchors — semantic divergence as observable breach rather than hidden mental-state mismatch. - Stated limitations: DCFL ceiling means no recursive transformations, no cross-field references (e.g. checksums over fields), no aggregation over variable-length collections, no context-sensitive validation. Standish-taxonomy paraphrastic only — no Orthophrase, no Metaphrase. Trust infrastructure (key distribution, revocation) and dialect-curation policy left to deployment.
- Structural-contracts extension (sketched in §VII-D, shipped post-paper as R4 + R5): Ed25519-style dialect signatures plus protocol constraints as causal Merkle DAGs over performative-typed messages, with Visibly Pushdown Languages for shape constraints and coordination-free monotonic verification over an append-only store. Lean-mechanised lattice-homomorphism theorems (
verify_monotone,verify_all_is_meet,verify_eventually_consistent) establish that replica verdicts join under G-Set merge.
Connections
- Common Business Communication Language — McCarthy’s 1975/1982 origin; CBCL is the formally-verified realisation of that vision.
- Security Applications Of Formal Language Theory — Sassaman et al.’s LangSec foundations.
- Proof-Carrying Code - Necula — Necula 1997, the architectural ancestor of CBCL’s R4 split (host runs a small certified verifier; producer ships the proof).
- Enforceable Security Policies - Schneider — Schneider 2000, the formal warrant for R5 monotone-verdict trace monitoring.
- An Axiomatic Basis for Computer Programming - Hoare — Hoare 1969, root of the contract-based verification line CBCL inherits.
- Authentication in Distributed Systems - Lampson Abadi Burrows Wobber — Lampson et al. 1992, the Speaks-For Calculus for delegated commitments.
- Articulating Reasons - Brandom — Brandom 2000, the philosophical underwriting of Public Semantics (entry point).
- Making It Explicit - Brandom — Brandom 1994, the full inferentialist treatise.
- Empiricism and the Philosophy of Mind - Sellars — Sellars 1956, the Myth of the Given that makes Semantics-Without-Minds possible.
- Philosophical Investigations - Wittgenstein — Wittgenstein 1953, Meaning As Use / language-games / rule-following.
- Two Dogmas of Empiricism - Quine — Quine 1951, Semantic Holism.
- Assertion - Stalnaker — Stalnaker 1978, Common Ground update semantics — operational analogue of CBCL’s verdict ledger.
- Languages and Language - Lewis — Lewis 1975, Convention of Truthfulness picture of how an abstract language gets in force in a population.
- Communicative Actions for Artificial Agents - Cohen Levesque — Cohen & Levesque 1995, the Mentalistic Semantics high-water mark CBCL deliberately is not.
- FIPA-ACL Specifications — the engineering of FIPA-ACL (envelope, conversation-id, ontology slot) CBCL inherits; the semantics it replaces.
- An Ontology for Commitments in Multiagent Systems - Singh — Singh 1999, the formal commitment ontology CBCL’s dialect contracts instantiate.
- Jones-Sergot Normative Systems — Jones & Sergot 1993, the Counts-As Relation that gives wire events institutional meaning under a dialect contract.
- Exploit Programming - From Buffer Overflows To Weird Machines — weird-machine analysis CBCL is designed to preclude at the parser layer.
- PKI Layer Cake - Kaminsky Patterson Sassaman — concrete parser-differential attacks; CBCL’s class-level unambiguity rules these out at the grammar level.
- The Halting Problems of Network Stack Insecurity
- A Language-Based Approach To Prevent DDoS
- LangSec
- KQML as an Agent Communication Language
- KQML - A Language And Protocol For Knowledge And Information Exchange
- FIPA-ACL
- ACL Rethinking Principles — Singh’s social-agency critique that CBCL’s “dialect installation as public commitment” inherits.
- Agent Communication Languages - Rethinking the Principles
- Verifiable Semantics for ACLs
- A Common Ontology Of ACLs
- Commitment-based Semantics
- The State of the Art in Agent Communication Languages
- Trends in Agent Communication Language
- Toward Principles for the Design of Ontologies Used for Knowledge Sharing — Gruber’s ontological-commitment principle CBCL inherits.
- Some Philosophical Problems from the Standpoint of Artificial Intelligence — McCarthy & Hayes’s representation/computation separation.
- Elephant 2000 - A Programming Language Based on Speech Acts — McCarthy’s intra-program speech-act counterpart to CBCL’s inter-organisational vision.
- Adjectival Modifiers
- S-expression
- Homoiconicity
- Code as Data
- Creating Languages in Racket — Flatt’s
#langmechanism that inspired CBCL’s homoiconic self-extension. - Extensibility in Programming Language Design - Standish
- Language Extensibility Taxonomy
- Paraphrase
- Orthophrase
- Metaphrase
- Self-Expression
- Self-Adaptation Self-Expression Self-Awareness ASCENS
- Dialects and Idiolects
- Keeping CALM - When Distributed Consistency is Easy — CALM theorem underwriting the forthcoming coordination-free contracts extension.
- Visibly Pushdown Languages
- MCP Landscape Security Threats And Future Research Directions
- Survey Of AI Agent Protocols
- Survey Of Agent Interoperability Protocols
- ClawWorm Self-Propagating Attacks Across LLM Agent Ecosystems — concrete LLM-agent worm whose root-cause analysis CBCL’s
lang-scoped provenance addresses. - Why AI Agents Communicate In Human Language
- Why Do Multi-Agent LLM Systems Fail
- Gossiping in Distributed Systems
- Gossip-based Aggregation in Large Dynamic Networks
Conceptual Contribution
- Claim: Agent communication languages can be simultaneously expressive, runtime-extensible, and tractably verifiable iff every message — including those that extend the language — is constrained to the deterministic context-free language class; this constraint is compatible with first-class self-extension, can be machine-checked end-to-end, and converts the unverifiable-mental-state problem of mentalistic ACLs into a public-commitment problem at the protocol layer.
- Mechanism: Eight-performative core grammar (LL(1), DCFL); homoiconic dialect mechanism — dialect definitions are valid CBCL messages with
extendclauses describing pattern-template substitutions; three safety invariants R1–R3 enforced at installation (no recursion via DFS cycle detection, declared resource bounds, core-vocabulary preservation);(lang name …)wrapper providing prefix-free deterministic dispatch into per-dialect sub-grammars so the union remains DCFL. Lean 4 formalization (LeanCbcl, ~5,400 LoC, 176 theorems) covering parser soundness/completeness, R1–R3, pipeline totality, anddcfl_preserved; verified parser extracted to a runnable binary (cbcl-parse). Rust reference implementation (cbcl-rs, ~11,000 LoC) with aResourceContext-tracked single-pass expander, gossip-based dialect propagation (O(log n)per Demers et al.), canonical RFC-9804 serialization for signing, algorithm-agnosticSignertrait, C FFI / WASM bindings, property-based + differential tests, fuzz targets. Draft IETFapplication/cbclInternet-Draft and Nostr transport binding. - Concepts introduced/used: Deterministic Context-Free Language, LangSec, Homoiconicity, S-expression, Dialects and Idiolects, Performatives, Self-Expression, Paraphrase, Orthophrase, Metaphrase, Visibly Pushdown Languages, CALM Theorem, Commitment-based Semantics, Public Semantics, Verifiable Semantics, Adjectival Modifiers, Ontological Commitment, Lean 4, Property-Based Testing, Parser Differential Attack, Weird Machine, Gossip Protocols, Canonical Serialization, Nostr, hence
- Stance: engineering / formal-methods, with explicit position-taking on the agent-communication design space
- Relates to: Realises McCarthy’s Common Business Communication Language vision (1975/1982) by giving formal safety guarantees McCarthy could only gesture at. Sits squarely in the LangSec programme of Security Applications Of Formal Language Theory and Exploit Programming - From Buffer Overflows To Weird Machines, extending it from input-validation analysis to protocol design — claiming, plausibly, to be the first ACL designed from LangSec principles and the first to demonstrate that LangSec is compatible with runtime self-extension. Inherits the DCFL-vs-ambiguous-CFG argument from PKI Layer Cake - Kaminsky Patterson Sassaman and applies it at the message layer rather than the certificate layer. Builds explicitly on Singh’s social-agency critique (ACL Rethinking Principles) and Wooldridge’s verifiability programme (Verifiable Semantics for ACLs) by treating dialect installation as a public commitment whose semantics are anchored by the
:examplesclause — a machine-checkable analogue of the commitment-trace conformance that Flexible Protocol Specification and Execution (Yolum & Singh) and Commitment Machines - Yolum and Singh establish for protocol actions. Where Pact - A Choreographic Language for Agentic Ecosystems takes the individual-rationality fork to answer “why follow a protocol?”, CBCL takes the publicness/verifiability fork: the meaning of a message is fully determined by the formal grammar plus declared semantics, with no mental-state inference required. Inherits Gruber’s ontological-commitment architecture but replaces “out-of-band agreement” with in-protocol propose/query/teach. The extensibility design draws on Creating Languages in Racket (Flatt’s#lang) and respects the boundary set by Extensibility in Programming Language Design - Standish: paraphrastic only, deliberately excluding orthophrase/metaphrase to keep the attack surface bounded. The forthcoming structural-contracts extension uses Visibly Pushdown Languages for shape constraints and the CALM Theorem for coordination-free verification — a natural bridge to commitment-machine semantics. The LLM-agent threat-model framing (ClawWorm Self-Propagating Attacks Across LLM Agent Ecosystems, MCP Landscape Security Threats And Future Research Directions) positions CBCL as the structural alternative to MCP/JSON-RPC-with-natural-language: same extensibility, vastly tighter security envelope.
Tags
#acl #langsec #formal-verification #lean4 #rust #homoiconic #dcfl #self-extension #commitment-based #position-paper #foundational