Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents

Reference: Shuvendu K. Lahiri (2026). arXiv:2603.17150v1 (Microsoft Research). Source file: 2603.17150v1.pdf. URL

Summary

Agentic AI can now write whole systems from brief prompts — “vibe coding” — but the intent gap between what a user means and what a program does has exploded accordingly. Lahiri argues that the central reliability bottleneck is no longer code generation but intent formalization: the automatic translation of informal user intent into checkable formal specifications. Lightweight tests, runtime contracts, logical contracts (Dafny, F*, Verus), and domain-specific languages lie on a single tradeoff spectrum — all formal, all checkable, differing only in coverage and cost.

The paper reframes the question of “can AI write the code?” as “can AI help us specify what the code should do, and then verify it does?”. Early research (GPT-4 generating Defects4J-catching postconditions; ClassInvGen; AutoVerus) already shows LLMs can produce meaningful, non-syntactic specifications. The key open bottleneck is validating specifications: since only the user knows their intent, there is no oracle — soundness/completeness must be estimated via semi-automated proxies (mutation testing, test-derived metrics). Research agenda: scaling beyond benchmarks, compositionality under change, richer logics, and human-AI specification interaction.

Key Ideas

  • Intent gap = semantic distance between informal NL requirements and operational code; agentic AI amplifies it at scale
  • AI-generated code is plausible by construction, not correct by construction
  • Intent formalization spectrum: tests → code contracts (assertions, pre/post/invariants) → logical contracts (Dafny/F*/Verus) → DSLs with verified synthesis
  • Soundness/completeness of specifications w.r.t. tests, without requiring a code-side oracle
  • Complementary to spec-driven development tools (GitHub Spec Kit) — closes their informality gap
  • Not autoformalization: cheap, targeted spec fragments beat full NL→logic translation

Connections

Conceptual Contribution

Tags

#llm-agents #formal-verification #specification #intent-formalization #vibe-coding

Backlinks