Monday 11 December 2017

Top Ten Human Genes

Peter Kerpedjiev had an idea that required some moderately high-throughput analysis and got himself a full-page spread in Nature, Europe's premier general science magazine about The most popular genes in the human genome. This requires a mash-up of two sorts of data which are effectively orthogonal to each other - related but not correlated. Most popular is here defined by those genes which have appeared most often in the recent scientific literature. There's all sorts of other stuff we know about genes and their protein products: molecular weight; genomic location; which tissues they are expressed in; whether they are receptors or enzymes or signalling molecules or proteins that switch on other genes. You wouldn't expect that any of these things-we-know would tell us about the other attributes.

We had a paper in the 00s, for example, which showed that genes expressed in the liver (or heart or kidney) are scattered all over the genome. The long genes aren't all found on the longest chromosome.  Olfactory receptors are clustered in little groups, it is true, but there are lots of these OR clusters and toll-like receptors TLRs are all over the shop. We'd be mad if we only wrote papers about receptors and ignored enzymes. Kerpedjiev was curious about which genes/proteins occupied the collective time and energy of science and wrote a simple-enough script to snag this information for each and every one of the 27,000 protein coding genes we know about. It's exactly the same idea as I've been progressing with the Masters of Imm up in Trinity over the last several years from 2012 until they sacked me in 2016. I called it the Most Sexy Immuno-protein competition. We didn't aspire to be comprehensive because we couldn't write a simple-enough script without a lot of help. Nevertheless, we showed that some TLRs were stupidly more popular than others because science puts a lot of handicaps on doing original research: everyone - HoDs, funders, editors, reviewers and referees - is happier if you mullock along in the footsteps of others. Wenceslas science, we might call sing it.

A few proteins acquire legs and outstrip their trudging  rivals for the attention of scientists. Aled Edwards from Toronto did a similar study ten years ago in which he showed that the $1billion Human Genome Project had been effectively useless in generating new targets of research to ameliorate the human condition. Researchers found it easier to fondle each other's work than to strike out into the unknown. Working within the herd is safe but not very exciting. Going all maverick makes funders nervous and the results require too much effort to assimilate and tend to get ignored. I could ask you to guess which genes are most highly cited in the scientific literature, but even if you are full time in bio-science you likely won't have the breadth of interest to know them all, let alone put them in the correct order.

Well here they are [L]. I'm surprised that TLR4 isn't there but that's only because we discovered a minority interest TLR and so I think that TLRs are bound to be interesting to everybody and I acknowledge that TLR4 trumps our 'umble TLR15. But all TLRs are collectively a bit of a side-show. Even among the top 10, p53 is Eclipse first, the rest nowhere, but in a way that is reminiscent of Zipf's distribution laws for letters or Benford's for numbers. So who are these celebrity boys and girls of biomedical world?
p53 is the guardian of the genome a tumour suppressor which is found to be mutated in about 50% of all cancers. The implication is that, when fully fighting fit it is preventing the development of cancer.. Several of the other genes reflect the biomedical world's obsession with cancer - which affects the family and friends of the affluent white males in power in the West - rather than possible targets for infectious diarrhoea, TB or malaria. The million black babies a year who  succumb to each of those diseases can't afford to pay for drugs.  #2 is TNF whose name tumour necrosis factor says it all: it works to gee up the immune system to kill tumours. VEGF vascular endothelial growth factor is the source of another nifty insight to treating cancers. As a tumour grows through out-of-control cell division it demands to have more oxygen and glucose to fuel its energy demands. If we can suppress the develop the growth and development of the local matrix of capillaries then we can suffocate the traitor in our midst. EGFR epidermal growth factor receptor works on the same process from a different angle. If we can jigger the receptor of a growth factor, then we can also suppress growth. Note the GF in TGFB it's another target for growth factor control. Note the R in ESR1 the oestrogen receptor which is involved in ovarian and breast cancer and tell us the oestrogen has more role in life than just shedding an egg a month for 30 years.

I'll refer you to the original article for a really neat graph of the timeline of trendiness. When I was in college 3% of all the publications were obsessing with beta haemoglobin HBB, mutations in which led to the first genetic disease sickle cell anaemia. Hey, before President Richard Nixon signed the National Cancer Act in 1971 and diverted $billion$$$ towards the War of Cancer, biomedical science cared about black babies. You have to talk % impact because in absolute terms publications were a trickle back then compared to the tsunami of tosh today. I say tosh because the average citations for a scientific paper is less than one: more than half of all papers published bob up and promptly sink without trace effectively unread by everyone including the authors.
1980 280,000
1990 410,000
2000 532,000
2010 940,000
2016 1,259,000. Heck and jiminy there are even 13,000 pubs for 2018 out there - I guess mostly from the Journal of Clairvoyant Studies and Prognostics.

No comments:

Post a Comment