Friday, December 31, 2004

e is for error correcting

random thoughts on new year’s eve. error correcting codes are cool.

I haven’t fully techie dorked out in a while (too much of a social scientist, sigh), so I thought I would explore this random part of my mathematics/computer science curriculum that comes up in weird random places: error correcting codes.

the basic idea is that an amazing number of parts of human life can be thought of as encoded message transmission: dna, language, music, art, puzzles. At mit, a large number of classes dealt with encoded message transmission but only recently have I started appreciating the broader applications.

a recent American scientist article talked about the efficiency of the dna-codon to amino acid coding. everyone knows that there 64 (corrected) possible codons (length 3, with 4 base pairs) but only 20 something amino acids. What’s cool is how amazingly efficient the encoding is, that allows functionality up to random errors in the signal.

This is the same idea behind how cell phones work to carry voice messages, how cd’s work to store music, how mp3s compress music, and many more. this train of thought was largely inspired by the adverts for dialup modem speed enhancers, which reminded me of the power of z-modem over Kermit and x-modem.

a code can be thought of as a subset of a message space. efficiently designed error correcting codes have an associated distance metric that places different code words as far apart as possible, so that many errors in transmission are necessary before you mistake one code word for another. the English language is such a code. i cna tpye a vrye grrbllde emssgge adn yru cyn ltlll rped it. economist ariel Rubenstein and others discuss the optimal design of languages.

a fun simple experiment is to run pkzip on a text file, and you’ll be able to demonstrate to yourself that English and in fact most languages has a message space 8 times larger than the code space. each letter in a paragraph typically conveys only about 1 bit of information.

other fun applications:

economist glen loury believes that the identities we use are codes to truthfully transmit information about ourselves, and then argues that the codes African Americans use are inefficient and self-defeating.

in analog signal processing, the nyquist frequency is a nice analog to what’s going on in this digial coding/encoding.

my favorite application is a riddle that I will end with. 10 prisoner’s are told they are to be executed. they will be lined up single file and each will be given a hat, black or red. They will be lined up in such a way that each prisoner can see the hats of each person in front of him, but not his own. The person in the back can see 9 hats, the person in front of him 8, the person in front can see none. From back to front, each prisoner will be asked to say either black or red. If he says the color of his own hat, he is saved, if not, he will be killed. Find the algorithm that in expectation, saves the most number of prisoners.

(Hint: 9.5 prisoner’s can be saved…)


Anonymous said...

um, a slightly ironic error check to your 'e is for error correcting' entry...last i checked, there are only 64 possible codons. it's 4^3, not 3^4.

-your bio-dork friend.

HoBs said...

ah, point taken, thanks. it is indeed 64. 3 basepairs, 4 types. 4*4*4

yah, so me and math. can never be trusted to get it right the first time.