A Mathematical Theory of Communication
Reference: Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27(3), pp. 379–423 and 27(4), pp. 623–656. (Republished 1949 with additional commentary by Warren Weaver as The Mathematical Theory of Communication, University of Illinois Press.) DOI · Open access PDF (Harvard) · Internet Archive (BSTJ scan)
Summary
Shannon’s two-part 1948 paper founds information theory as a discipline. The setting is the engineering problem of communication: a source produces messages, an encoder transforms them into signals over a noisy channel, a decoder attempts to reconstruct the original. Shannon’s first move is to argue that the meaning of messages is irrelevant to the engineering problem — only their statistical structure matters. He then develops the foundational notions: entropy H = -Σ p_i log p_i as the average information content per symbol of a source; mutual information I(X;Y) = H(X) - H(X|Y) as the information one variable carries about another; channel capacity C = max I(X;Y) as the supremum of mutual information over all input distributions. The technical heart consists of two coding theorems. Source coding (noiseless coding): any source with entropy H can be losslessly compressed at rate arbitrarily close to H bits per symbol, but no lower. Channel coding (noisy-channel coding): any source with entropy below the channel capacity C can be transmitted with arbitrarily low error probability using sufficient block length, but transmission above C necessarily incurs error. Together these establish the operational meanings of entropy and capacity and bound what any communication system can achieve. The companion Mathematical Theory of Communication (1949) adds Weaver’s expository introduction, popularising the framework and inaugurating the broader engagement of philosophy and the social sciences with information theory. Shannon’s framework supplies the technical foundation for every communication system, the conceptual foundation for algorithmic information theory and the MDL principle, and a recurring background reference in agent-communication design — most explicitly in Why AI Agents Communicate In Human Language, which frames the case against natural-language inter-agent communication in Shannon-theoretic terms (lossy channel, low capacity per token, ambiguous code).
Key Ideas
- Engineering decoupling from meaning: the engineering problem of communication is independent of semantic content; only the statistical structure of the source matters.
- Entropy as average information:
H(X) = -Σ p_i log_2 p_i measures the average uncertainty of a random variable in bits; for an i.i.d. source emitting symbols with probabilities p_i, H is the lower bound on bits-per-symbol for lossless compression.
- Mutual information:
I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) symmetrically measures how much knowing Y reduces uncertainty about X (and vice versa).
- Channel capacity:
C = max_{p(x)} I(X;Y) — the maximum mutual information achievable over a memoryless channel with input distribution p(x). Capacity is the operational supremum on reliable transmission rate.
- Source coding theorem: lossless compression can achieve rate arbitrarily close to
H (entropy) but no lower; the practical realisation is via Huffman coding, arithmetic coding, and variants.
- Noisy channel coding theorem: for any rate
R < C, there exists a coding scheme achieving error probability arbitrarily close to zero with sufficient block length; for R > C, error probability is bounded away from zero. The result is non-constructive — Shannon’s proof uses random coding — but the existence guarantee is what made coding theory a field.
- Continuous channels and AWGN capacity: the second part extends the discrete results to continuous channels, deriving the famous
C = B log(1 + S/N) for additive-white-Gaussian-noise channels of bandwidth B and signal-to-noise ratio S/N.
- Ergodic processes and Markov sources: Shannon’s analysis carefully extends from i.i.d. to ergodic and Markov sources, motivating the asymptotic equipartition property (AEP) and laying the groundwork for source coding of non-i.i.d. data.
Connections
Conceptual Contribution
- Claim: The engineering problem of communication is mathematically separable from semantics; the right primitives are entropy (uncertainty per symbol of a source), mutual information (information one variable carries about another), and channel capacity (supremum of reliable transmission rate over a noisy channel). Two coding theorems establish the operational meaning of these quantities — entropy as the lossless-compression lower bound, capacity as the reliable-transmission upper bound.
- Mechanism: Probabilistic model of source / channel / receiver; definition of entropy, conditional entropy, joint entropy, and mutual information; noiseless source-coding theorem (lossless compression to within ε of
H); noisy channel-coding theorem (reliable transmission below C, error-bounded above C); extension to continuous channels and Gaussian noise.
- Concepts introduced/used: Shannon Entropy, Shannon Information, Mutual Information, Channel Capacity, Source Coding Theorem, Noisy-Channel Coding Theorem, Asymptotic Equipartition Property.
- Stance: founding paper of an entire discipline.
- Relates to: Direct technical predecessor of Kolmogorov complexity / Algorithmic Information Theory (Solomonoff, Kolmogorov, Chaitin 1960s) — where Shannon measures the average information of an ensemble, AIT measures the absolute information of an individual object as the length of its shortest description; both are operationally founded on the same coding-theorem intuition. Shannon’s deliberate decoupling of the engineering problem from semantics inaugurates the running tension in agent communication between Shannon information (statistical, channel-bounded) and meaningful information (semantic, conventional, illocutionary). The agent-communication-language tradition exists precisely to address what Shannon’s framework deliberately set aside; nevertheless, the engineering bounds Shannon establishes constrain any agent communication channel, including LLM-mediated natural-language exchange. Why AI Agents Communicate In Human Language makes this explicit: natural language as inter-agent code has low channel capacity per token, high error probability under lossy LLM compression, and ambiguous decoding — Shannon-theoretic objections that motivate the case for structured ACLs. Conceptually adjacent to Chomsky 1956, which addresses the generative-grammatical structure of language as a parallel layer to Shannon’s statistical structure; together they delimit the design space of any communication system.
Tags
#information-theory #shannon #entropy #channel-capacity #foundations #communication