If you can show that two sequences are more similar to each other than either is to a third one, then you have established a tree of relationships. In the simplest-possible-tree [L] A and C are closely related sister 'taxa' while B is only a cousin; and yes, B and D are sisters to each other also. Operational Taxonomic Units OTUs, here A B C and D, could be individuals or species or their genes. These assignments of similarity and relatedness are based on calculating how similar are the sequences when they are aligned together. The gross differences are easy to tally up. Here is a fragment of the protein sequence for beta-haemoglobin from four mammals; two from Order Primates, two from Order Rodentia:
Mouse KDFTPAAQAAFQKVVAGVAT
Rat KEFTPCAQAAFQKVVAGVAS
Human KEFTPPVQAAYQKVVAGVAN
Baboon KEFTPQVQAAYQKVVAGVAN
*:*** .***:********.
Note that for almost all the amino acids (AA the building blocks of all proteins here represented by 20 different letters) are identical in all four species. Yiu can check out the encoding here. The convention is that, when all the AAs at a given site are the same, then a * is put under the column. Next note that for the majority of the other columns, the two rodents have one variant and the two primates have another. In one place, however, outlined in red, rats look more like primates than their fellow rodents; but that's just a random blip. The easiest way of getting a final answer on who is related to whom is to tally up the number of same AAs and divide by the total length of the sequence [here an arithmetically convenient N=20] to get a % identity and then report that in a matrix or table:
Species
|
Mus
|
Rat
|
Hum
|
Bab
|
Mus
|
100%
|
85%
|
75%
|
75%
|
Rat
|
85%
|
100%
|
80%
|
80%
|
Hum
|
75%
|
80%
|
100%
|
95%
|
Bab
|
75%
|
80%
|
95%
|
100%
|
No comments:
Post a Comment