Expand ↗
Page list (1268)

CBCL: Safe Self-Extending Agent Communication

Reference: O’Connor, H. (2026). LangSec Workshop, IEEE Security and Privacy Workshops (SPW 2026). arXiv:2604.14512v1 [cs.CR], 16 Apr 2026. Source file: oconnor-cbcl-langsec26.pdf. URL. Reference implementation: https://codeberg.org/anuna/cbcl-rs.

Summary

Building on McCarthy’s 1975/1982 Common Business Communication Language vision and the language-theoretic security programme of Sassaman, Patterson, Bratus and Locasto (Security Applications Of Formal Language Theory), this paper presents a contemporary CBCL: an agent communication language designed so that all messages — including runtime language extensions — remain within the deterministic context-free (DCFL) complexity class. The motivating diagnosis is that contemporary agent communication has slid up the Chomsky hierarchy without acknowledging the security cost: KQML and FIPA-ACL are CFL with informal extensibility, but MCP and LLM-based agent frameworks operate over inputs whose effective complexity is recursively-enumerable, where the validity of an arbitrary message is undecidable and “weird machines” (Exploit Programming - From Buffer Overflows To Weird Machines) become unavoidable. CBCL takes the opposite tack: minimal core (8 performatives — tell, ask, reply, hello, bye, ok, error, cancel) plus a homoiconic dialect mechanism that lets agents define, transmit, and adopt domain-specific vocabularies as first-class CBCL messages, with three machine-checked safety invariants — R1 (no recursion in dialect templates), R2 (declared resource bounds on depth, expansion size, verification time), R3 (core-vocabulary preservation) — that together guarantee DCFL preservation under arbitrary finite sequences of dialect installations.

The paper accompanies the design with a Lean 4 formalization (≈5,400 lines, 16 modules, 176 machine-checked theorems, zero sorry gaps, no custom axioms) covering parser correctness (soundness + completeness), R1–R3 verification, pipeline totality, and the central dcfl_preserved closure theorem; a verified parser binary cbcl-parse is extracted from the proofs. A separately-developed Rust reference implementation (cbcl-rs, ≈11,000 lines, no_std-compatible core) ships a hand-rolled O(n) parser, dialect verification engine, gossip-based dialect propagation (O(log n) convergence per Demers et al.), C FFI/WASM bindings, property-based and differential tests, and cargo-fuzz targets; differential tests cross-validate the Rust parser against the Lean-extracted binary, eliminating spec-implementation parser differentials. Five example dialects (precision agriculture, AI planning, cross-chain asset transfer, artifact management, email) are R1–R3-verified end-to-end. A draft IETF Internet-Draft specifying application/cbcl and a Nostr transport binding (cbcl-nostr) accompany the paper. The Discussion explicitly aligns the design with Singh’s social-agency programme: dialect installation is treated as a public commitment to the dialect’s declared semantics — semantic divergence becomes a breach of commitment, observable and accountable, rather than a hidden failure of shared mental state — sidestepping the unverifiable mentalistic semantics critique of Verifiable Semantics for ACLs.

Implementation status (post-paper, cbcl-rs). The reference implementation has shipped two further safety invariants the paper sketches as future work, plus a richer Lean formalisation: R4 (Integrity) — Ed25519-style signatures over canonical encodings via an algorithm-agnostic Signer trait, with three-valued install verdicts (Valid / Unsigned / Invalid) — and R5 (Contract Well-formedness) — optional (protocol …) and (shape …) clauses on a dialect, expressing causal Merkle DAG protocols and per-performative shape contracts; R5 enforces acyclicity, reachability from begin, performative definedness with ancestor closure for extends, step uniqueness, and depth-bound respect, all decidable in linear fuel at install time. The §VII-D structural-contracts vision is therefore live (SPEC-002, SPEC-003): verify_monotone, verify_all_is_meet, and verify_eventually_consistent are proved theorems showing causal-protocol verification is a lattice homomorphism that joins coordination-free under G-Set store merge — explicit CALM Theorem alignment. The Lean codebase has grown to 380+ declarations across 19 files (still zero sorry, only the standard axioms propext / Classical.choice / Quot.sound); the workspace ships 837 tests including 584 cbcl-core unit tests, 37 proptest cases, 21 differential integration tests against the Lean-extracted binary, libFuzzer harnesses, and cargo-mutants mutation testing at a 90% kill threshold. The crate layout has grown from the paper’s five to seven: the original cbcl-core / cbcl-parser / cbcl-cli / cbcl-wasm / cbcl-ffi plus cbcl-erl (Erlang/LFE binding, SPEC-009 — with SPEC-010 binding-conformance suite) and cbcl-arena (Agent Arena platform, SPEC-012 — composable referee on cbcl-lfe-router with capability-keyed message routing and signed receipt logs). A worked sealed-bid auction dialect (SPEC-004) demonstrates structural defence against false-claim manipulation — concrete evidence that the R5 contract layer is rich enough to encode mechanism-design integrity properties decidably. The earlier Scheme prototype (anuna-research/cbcl) is superseded; architecture decisions live as ADRs under .hence/ (purity boundary, parser library choice, error handling, WASM API surface, R4 adversarial review).

Key Ideas

  • DCFL as the Goldilocks complexity class: Regular is too weak for nested envelopes/dialects; general CFL admits ambiguous grammars and parser-differential attacks; context-sensitive is PSPACE-complete; Type-0 is undecidable. DCFL is the minimal class that supports nested structure with class-level unambiguity (parser equivalence under language evolution).
  • Homoiconic self-extension: Dialect definitions are themselves valid CBCL messages, parsed by the same recogniser used for ordinary communication — a single attack surface, no separate meta-language. Inspired by Racket’s #lang mechanism.
  • Three safety invariants R1–R3: declarative pattern-template substitution only (no recursion, no iteration, no reflection); declared resource bounds (depth ≤ 64, expansion ≤ 8192 chars, verification ≤ 1000 ms); the eight core performatives cannot be redefined.
  • DCFL preservation theorem: dialect invocations must appear inside a (lang name …) wrapper; the lang tag plus a unique dialect name forms a prefix-free partition that lets a deterministic pushdown automaton dispatch to the appropriate subgrammar without extra lookahead — circumventing the fact that DCFLs are not closed under union in general. Mechanised in Lean 4 as dcfl_preserved.
  • Lean 4 formalization with extraction: 176 theorems, 0 sorry gaps, no custom axioms; the verified parser binary cbcl-parse is extracted from the proofs and runs the same fuel-bounded recursive-descent parser as the model.
  • Rust reference implementation: cbcl-core (no_std, no unsafe), cbcl-parser, cbcl-cli, cbcl-ffi, cbcl-wasm. Differential tests against the Lean-extracted binary; cargo-fuzz over five entry points.
  • Eight core performatives (tell, ask, reply, hello, bye, ok, error, cancel) plus four message categories (simple, meta, lang-scoped, wrapped); keyword parameters in Common-Lisp style (:thread, :in-reply-to).
  • Single-pass template expansion: terminates in time linear in the template plus argument size; not Turing-complete; output is not fed back into the expander, ruling out unbounded re-expansion.
  • Epidemic dialect propagation: gossip-based with Demers et al.’s O(log n) convergence bound; agents independently verify R1–R3 and signatures before installing.
  • Canonical serialization per RFC 9804 for cryptographic operations — deterministic byte representation as a precondition for signature verification.
  • Dialect identity by content hash: name collisions resolved by canonical-form hash; downgrade attacks rejected at installation.
  • Self-expression vs self-adaptation (Zambonelli et al.): traditional ACLs support only parameter tuning within fixed structure; CBCL is a self-expressive protocol whose vocabulary structurally evolves under safety constraints.
  • Singh-aligned reading of dialect installation: a public commitment to declared semantics, with :examples clauses providing machine-checkable behavioural anchors — semantic divergence as observable breach rather than hidden mental-state mismatch.
  • Stated limitations: DCFL ceiling means no recursive transformations, no cross-field references (e.g. checksums over fields), no aggregation over variable-length collections, no context-sensitive validation. Standish-taxonomy paraphrastic only — no Orthophrase, no Metaphrase. Trust infrastructure (key distribution, revocation) and dialect-curation policy left to deployment.
  • Structural-contracts extension (sketched in §VII-D, shipped post-paper as R4 + R5): Ed25519-style dialect signatures plus protocol constraints as causal Merkle DAGs over performative-typed messages, with Visibly Pushdown Languages for shape constraints and coordination-free monotonic verification over an append-only store. Lean-mechanised lattice-homomorphism theorems (verify_monotone, verify_all_is_meet, verify_eventually_consistent) establish that replica verdicts join under G-Set merge.

Connections

Conceptual Contribution

Tags

#acl #langsec #formal-verification #lean4 #rust #homoiconic #dcfl #self-extension #commitment-based #position-paper #foundational

Backlinks