Sunday, 27 October 2013

Finding ELVIS

Elvis Presley "The King" was a phenomenon. He sang and girls cried; his hips moved girls in another way; he served his country in the army, moonlighting 10 Top Forty hits while in uniform; he made films that were part of the canon of Sunday movies in dull drizzly 1960s Ireland. He brought a little light and a lot of sexiness into the world and you can't fault him for that.  So when he died in August 1977 after several years of decreasing health and increasing medicinal and other drugs, some people just could not believe it.  So for the last 36 years people have been seeing him, in Tennessee and even more outlandish places.  On dit que his DNA doesn't match etc etc

This won't be the first, nor yet the last, time when I write about DNA and the genetic code. I've spent the last 25 years scrutinising DNA and protein sequences for money, so it's going to be part of The Blob, or I'm not going to be true to myself. In my homage to Nirenberg and Matthaei, I showed the whole table by which the information contained in the 4 DNA/RNA bases, arranged as (4x4x4) 64 codons, are translated into the 20 amino acids that go to make up all proteins.  The code is redundant because 64 >> 20, so several codon triplets code for each amino acid.  But interestingly redundant because a single change (mutation) in the DNA is quite likely to produce an amino acid that is either identical or quite similar to the original pre-change amino acid. Rather than risk boring you, (You can get the whole thing in a separate tab here) I only show the top 1/4 of the standard table:
UUU Phe F UAU Ser S UCU Tyr Y UGU Cys C
UUC Phe F UCC Ser S UAC Tyr Y UGC Cys C
UUA Leu L UCA Ser S UAA Stop UGA Stop
UUG Leu L UCG Ser S UAG Stop UGG Trp W
Each of the 20 amino acids has particular chemical properties, and a name which predates the discovery of the genetic code by many decades.  The first a.a. was isolated from asparagus juice by Vaquelin and Robiquet in 1806 and called asparagine.  There doesn't appear to be a direct connexion between the smallest amino acid Glycine and the soya bean Glycine max, except that they both have a tendency to sweetness (Greek γλυκός). As proteins are polymers and typically about 300 a.a. long, we need abbreviations to write and identify them. Sometimes it's easier to use the more directly memorable three-letter form: Leu for leucine, Ile for iso-leucine, Ala for alanine, Phe for phenyl-alanine but for long sequences clearly a one-letter code is more compact and convenient: L, I, A, F for the partial list above. Here's the full alphabet:
A Ala C Cys D Asp E Glu F Phe G Gly H His I Ile K Lys L Leu
M Met N Asn P Pro Q Gln R Arg S Ser T Thr V Val W Trp Y Tyr
Ooops not a full alphabet: there are only 20 amino acids and our English alphabet has 26 letters.  So you can't write everything you want in DNA code translated into protein using the convention above.  Missing letters include B, J, X and Z and, most inconveniently, 40% of the vowels - O and U.  As the DNA and protein sequence databases get exponentially larger, you can find all sorts of things in there. Most importantly you can find ELVIS (a partial protein sequence (...-Glu-Leu-Val-Ile-Ser- ...) but not, yet, ELVISPRESLEY.  But we'll never find JESUS in there.

You can check what-you-fancy out yourself: http://prosite.expasy.org/scanprosite/ select
Option 2 Submit MOTIFS to scan them against a PROTEIN sequence database.
enter [YURNAMEHERE] in the box below and click [Start the Scan] at the bottom.  Don't be a sheep and search for ELVIS, there are at least 200 'hits' for that sequence, we now know about that.  (and remember BOJUXZ won't go).  I have a peculiar ambivalence about the prolactin receptor from Tilapia nilotica because the fish has eaten both my daughters. But then again so has hunchback a zinc-finger developmental control protein from Drosophila virilis.

I wrote a while back about Craig Venter playing about with newly created DNA sequence, to create hidden messages as well as a totally artificial life-form.  That was pretty geeky. The Venter Code ignored the 'universal' genetic code entirely although it did use triplets of DNA bases.  According to Thomas P Hopp, Dr. Peyton McKean, a CIA spook, modified the real genetic code to incorporate the missing letters. Apparently the Agency is using this concept to send genetically modified microbes with hidden messages to other places where they can clone and sequence the DNA inside.  Pretty secret, huh? uber-geeky and not really very convenient. Although, as Peyton McKean features in a new book "The Neah Virus: a Peyton McKean Mystery" by Thomas P Hopp, who also wrote "Dinosaur Wars: counterattack" it might all be hokum/fiction.  Buy the book and find out!?

So far so facetious, but isn't this blog called Science Matters?  Where's the SCIENCE?  Well, it's here in a the head louse:
>E0VVQ8_PEDHC OS=Pediculus humanus
MLLLSCILFLFEDVLGSIGDNSFFYINCVQYCDYKFCHSGKQKVHHRALK
NFEYSLWSCIENCEYECQWKTVESFQKRNWPIPQFRGKWPFIRLFGFQEP
ASVFFSVLNFITVLKLILLFRKKVSNSAPYYYIWNLFGLIQLNSWFWSTV
YHTRDVDFTEKMDYISAFILIIYSFYAMGLRYISPSINKKTLLWSIFCGL
FGLNHVSYLWLYNFDYGWSNRSRVVSMEFSSFSITILCLENFYVCSSSRS
NYFTRAYGFSSHFMVVRCARPLARYINYLQYIFFSFCHR

1 comment:

  1. Dan Brown might be interested in this for the next coded drenched novel, hefty fee for thee!

    ReplyDelete