The Seven Turrets of Babel: A Taxonomy of LangSec Errors and How to Expunge Them

Reference: Momot, Bratus, Hallberg, Patterson (2016). IEEE Cybersecurity Development (SecDev). Source file: 7turretsaspublished.pdf. URL

Summary

The authors catalogue seven recurring classes of input-handling bugs under the LangSec (language-theoretic security) lens: shotgun parsing, non-minimalist input-handling code, differing interpretations of the input language, incomplete protocol specification, overloaded fields in input format, permissive processing of invalid input, and inability to express input languages in the Chomsky hierarchy. Each class is grounded in concrete CVEs (Heartbleed, Android Master Key, Rosetta Flash, OpenSSL CVE-2016-0752).

LangSec’s remedy is to treat input acceptance as a formal language-recognition problem: specify a grammar no more complex than deterministic context-free, build a recognizer that fully validates before any processing, and cleanly separate parsing from application logic. The paper proposes new CWE entries naming each weakness so auditors can precisely describe vulnerable input-handling code.

Key Ideas

  • LangSec: input should be a well-defined language with a fully-validating recognizer.
  • Seven anti-patterns: shotgun parsing, non-minimalist code, interpretation drift, incomplete spec, field overloading, permissive invalid input, undecidable grammars.
  • Chomsky hierarchy as a safety ceiling for input languages.
  • Hand-rolled parsers vs parser-combinator / generator tooling (e.g., Hammer).
  • Proposed new CWEs to label LangSec anti-patterns.

Connections

Conceptual Contribution

Tags

#langsec #security #parsing #language-theory

Backlinks