Information theory models (May 11, 2004).

The past few weeks, I've been working on a web page that looks at concepts like entropy, uncertainty, and information theory. It started out as a simple definition of entropy, but grew so much that I split most of the material off into a separate handout on Information Theory Models.

In the process of looking for web resources, I found a fascinating book, Information Theory, Inference, and Learning Algorithms by David J.C. MacKay (ISBN: 0521642981). The authors has placed the full text of this book in pdf format (all 640 pages) on the web. You are not allowed to print the pdf file, but it does allow you to browse this book. It is a rather eclectic mix of topics. Here's the table of contents:

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
1 Introduction to Information Theory . . . . . . . . . . . . . . . . 3
2 Probability, Entropy, and Inference . . . . . . . . . . . . . . . . . 22
3 More about Inference . . . . . . . . . . . . . . . . . . . . . . . . 48

I Data Compression . . . . . . . . . . . . . . . . . . . . . . . . 65
4 The Source Coding Theorem . . . . . . . . . . . . . . . . . . . . 67
5 Symbol Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6 Stream Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7 Codes for Integers . . . . . . . . . . . . . . . . . . . . . . . . . . 132

II Noisy-Channel Coding . . . . . . . . . . . . . . . . . . . . . . 137
8 Correlated Random Variables . . . . . . . . . . . . . . . . . . . . 138
9 Communication over a Noisy Channel . . . . . . . . . . . . . . . 146
10 The Noisy-Channel Coding Theorem . . . . . . . . . . . . . . . . 162
11 Error-Correcting Codes and Real Channels . . . . . . . . . . . . 177

III Further Topics in Information Theory . . . . . . . . . . . . . 191
12 Hash Codes: Codes for Efficient Information Retrieval . . . . . 193
13 Binary Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
14 Very Good Linear Codes Exist . . . . . . . . . . . . . . . . . . . 229
15 Further Exercises on Information Theory . . . . . . . . . . . . . 233
16 Message Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
17 Communication over Constrained Noiseless Channels . . . . . . 248
18 Crosswords and Codebreaking . . . . . . . . . . . . . . . . . . . 260
19 Why have Sex? Information Acquisition and Evolution . . . . . 269

IV Probabilities and Inference . . . . . . . . . . . . . . . . . . . 281
20 An Example Inference Task: Clustering . . . . . . . . . . . . . . 284
21 Exact Inference by Complete Enumeration . . . . . . . . . . . . 293
22 Maximum Likelihood and Clustering . . . . . . . . . . . . . . . . 300
23 Useful Probability Distributions . . . . . . . . . . . . . . . . . . 311
24 Exact Marginalization . . . . . . . . . . . . . . . . . . . . . . . . 319
25 Exact Marginalization in Trellises . . . . . . . . . . . . . . . . . 324
26 Exact Marginalization in Graphs . . . . . . . . . . . . . . . . . . 334
27 Laplace's Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
28 Model Comparison and Occam's Razor . . . . . . . . . . . . . . 343
29 Monte Carlo Methods . . . . . . . . . . . . . . . . . . . . . . . . 357
30 E cient Monte Carlo Methods . . . . . . . . . . . . . . . . . . . 387
31 Ising Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
32 Exact Monte Carlo Sampling . . . . . . . . . . . . . . . . . . . . 413
33 Variational Methods . . . . . . . . . . . . . . . . . . . . . . . . . 422
34 Independent Component Analysis and Latent Variable Modelling
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
35 Random Inference Topics . . . . . . . . . . . . . . . . . . . . . . 445
36 Decision Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
37 Bayesian Inference and Sampling Theory . . . . . . . . . . . . . 457

V Neural networks . . . . . . . . . . . . . . . . . . . . . . . . . 467
38 Introduction to Neural Networks . . . . . . . . . . . . . . . . . . 468
39 The Single Neuron as a Classiffier . . . . . . . . . . . . . . . . . . 471
40 Capacity of a Single Neuron . . . . . . . . . . . . . . . . . . . . . 483
41 Learning as Inference . . . . . . . . . . . . . . . . . . . . . . . . 492
42 Hopfield Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 505
43 Boltzmann Machines . . . . . . . . . . . . . . . . . . . . . . . . . 522
44 Supervised Learning in Multilayer Networks . . . . . . . . . . . . 527
45 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . 535
46 Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549

VI Sparse Graph Codes . . . . . . . . . . . . . . . . . . . . . . 555
47 Low-Density Parity-Check Codes . . . . . . . . . . . . . . . . . 557
48 Convolutional Codes and Turbo Codes . . . . . . . . . . . . . . . 574
49 Repeat{Accumulate Codes . . . . . . . . . . . . . . . . . . . . . 582
50 Digital Fountain Codes . . . . . . . . . . . . . . . . . . . . . . . 589

VII Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
A Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
B Some Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
C Some Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 620

There are also some delightfully fun exercises. Here are three examples.

You have a two-pan balance; your job is to weigh out bags of our with integer weights 1 to 40 pounds inclusive. How many weights do you need? [You are allowed to put weights on either pan. You're only allowed to put one our bag on the balance at a time.]

Pattern recognition by molecules. Some proteins produced in a cell have a regulatory role. A regulatory protein controls the transcription of specific genes in the genome. This control often involves the protein's binding to a particular DNA sequence in the vicinity of the regulated gene. The presence of the bound protein either promotes or inhibits transcription of the gene. (a) Use information-theoretic arguments to obtain a lower bound on the size of a typical protein that acts as a regulator specific to one gene in the whole human genome. Assume that the genome is a sequence of 3 109 nucleotides drawn from a four letter alphabet A; C; G; T; a protein is a sequence of amino acids drawn from a twenty letter alphabet. [Hint: establish how long the recognized DNA sequence has to be in order for that sequence to be unique to the vicinity of one gene, treating the rest of the genome as a random sequence. Then discuss how big the protein must be to recognize a sequence of that length uniquely.]

In a magic trick, there are three participants: the magician, an assistant, and a volunteer. The assistant, who claims to have paranormal abilities, is in a soundproof room. The magician gives the volunteer six blank cards, five white and one blue. The volunteer writes a different integer from 1 to 100 on each card, as the magician is watching. The volunteer keeps the blue card. The magician arranges the five white cards in some order and passes them to the assistant. The assistant then announces the number on the blue card. How does the trick work?

Estimate in bits the total sensory experience that you have had in your life, visual information, auditory information, etc. Estimate how much information you have memorized. Estimate the information content of the works of Shakespeare. Compare these with the capacity of your brain assuming you have 1011 neurons each making 1000 synaptic connections, and that the capacity result for one neuron (two bits per connection) applies. Is your brain full yet?

I enjoyed browsing through the book over the past few days and I might have to buy it, just to support someone bold enough to put the full book out on the web for curious people like me.

Creative Commons License This work is licensed under a Creative Commons Attribution 3.0 United States License. It was written by Steve Simon. Category: Information theory