Tuesday, 21 May 2013
Nirenberg and Matthaei 1961
Nobody really knows why, but to a very close approximation, all proteins in all organisms incorporate just 20 amino acids and the same ones at that. Since Crick and Watson sorted out the structure of DNA 60 years ago this Spring, everybode kno that the information in DNA resided the linear ordering of just four 'bases'. We all now familiarly refer to them by their initials A T C and G. (It is too much for this post to explain but in some nucleic acids T is replace by another base designated U, so sometimes this list is written A C G T/U). C&W showed that the thing worked as a replicator if A always paired with T and C always with G. But it was a mystery as to how the linear ordering of DNA bases translated (now a technical term for the process) into the linear ordering of amino acids in proteins. Elementary (my dear Crick) math shows that two bases weren't enough to code for 20 amino acids because there can only be 16 of these 'dinucleotides': AA AC AG AT - CA CC CG CT - GA GC GG GT - TA TC TG TT. It seemed extravagant to suppose that 4 bases AAAA AAAC AACA etc. were required because there are 256 possible permutations there. Triplets (AAA AAC ACA ACC etc.) on the other hand seemed like Goldilocks' porridge to be 'just right'. Those 64 possibilities had a comforting biological redundancy. Crick the unarguably brilliant theoretician and others sat down with pencil and paper to work out how the encoding could happen. The first coherent internally consistent solution came from left-field - a physicist and cosmologist called George Gamow came up with a diamond code which hinged on the fact that, if you squinted, the structure of DNA could be seen as a repeated series of diamond-
shaped indentations. Gamow accepted as necessary-and-sufficient that the coding was triplet-based and further that was that each hole was symmetrical so that CXG would determine the same amino acid as GXC and further still that the middle base for a given amino acid could be either of the complementary pairings A or T, C or G. So CAG CTG GAC GTC would all code for "Z" while CGG CCG GGC GCG for "Y" etc etc. When the end bases were the same as each other there were only two possibilities for each encoding rather than four as above - ATA or AAA for amino acid "N", ACA or AGA for "M" etc. Remarkably, and I can imagine Gamow getting the shivers when he realised this, such an arrangement yields 12 four-way codings (accounting for 48 DNA triplets) and 8 two-way codings (accounting for 16 triplets). 12 + 8 being <shazzam!> just right to service the known 20 amino acids. It was so neat it had to be true. For various reasons of compelling permutational math Sydeny Brenner and others later showed that Gamow's code couldn't be true.
Francis Crick then stepped up to the plate with a hypothesis that a) the code was triplets b) the order didn't matter - so ACC CAC ACC all coded for amino acid "J", ACG, AGC, CAG, CGA, GAC, GCA all for "I", solo AAA for "H" etc. His solution required 4 one-way codings (4 triplets); 12 three-way codings (36 triplets) and 4 six-way codings (24 triplets). This mathematically compelling and biologically more reasonable solution cleanly explained the presence of 4 + 12 + 4 = 20 amino acids and 4 + 36 + 24 = 64 DNA triplets! Begodde it was neat, and if poor Gamow had been deluded, this just had to be true. Some of the excitement of the time is conveyed by Sydney Brenner's eyebrows in this series of soundbyte interviews at the wonderful archive of interviews at webofstories. If you want more details of the math, you could do worse than read this great essay by Brian Hayes at American Scientist, where several other zany-with-hindsight codes are described in colour.
Meanwhile back in 1961 at NIH, a couple of more-or-less-unknown biochemists Marshall Nirenberg and Heinrich Matthaei, instead of looking at the ceiling thinking, were looking at the lab-bench experimenting. They put together the contents of a typical cell in a test tube, fed into the sludge a molecUUUUUUUle consisting entirely of the base Uracil and generated an artificial protein consisting entirely of the amino acid phenylalanine. The first step, UUU=Phe, in cracking the code had been made, like Thomas Young's insight that cartouches in hieroglyphic script represented proper names. The other codings were knocked off over the next tuthree years by Nirenberg, Matthaei and many others and the real genetic code was revealed to be a robust but quirky kludge of contingency, accident and evolution. Nirenberg shared the 1968 Nobel prize with Holley and Khorana, and Matthaei didn't, adding to the list of controversial awards associated with Mr Dynamite's legacy. I've always supposed it was because nobodaei could spell his name properlaei.