Information & Surprise
Bits · Codes · Likelihood
Information is the resolution of uncertainty—not poetry, not proof of soul.
— After Shannon’s spirit, loosely
Claude Shannon turned “news value” into mathematics: rare messages carry more bits if your model of the source is calibrated. This essay tours entropy, coding, and where the formalism stops (semantics, ethics, consciousness live elsewhere).
I. Bits as Log Probabilities
If an event has probability p, surprisal scales like −log p
Flip a fair coin: heads and tails each have probability one-half; one bit resolves the toss. Rare events, under the same log base, demand more bits to specify—if your model is honest.
Add expectation across outcomes and you recover Shannon entropy: average surprisal under the distribution.
II. Codes Approach Entropy
Why compression has a floor
Lossless compression cannot beat the entropy rate of an ergodic source (asymptotically)—this is the content of Shannon’s source coding idea in narrative form.
That is why redundant English shrinks; encrypted noise does not. The bit meter sees structure your eye might miss.
III. Noise and Capacity
Reliable communication over unreliable wires
Channel coding adds controlled redundancy so errors become correctable—Hamming distance stories, turbo and LDPC descendants in modern Wi-Fi and deep-space links.
Capacity is a precise limit; engineering dances under it with latency and power budgets.
IV. What Shannon Does Not Say
Meaning, relevance, and wisdom
High-information messages under one model can be gibberish under another. Semantics requires interpreters—biology, culture, goals.
Beware vendors who rename likelihood “knowledge” without datasets, metrics, or accountability.
Surprise, Metered
Information theory is one of humanity’s great clarifiers—also easy to over-quote. Use it where it measures uncertainty; keep poetry for what poetry does.
Measure bits when you can name your model.