It's a fair approximation to say that all progress in medicine is now based on genes and proteins: how they differ in health and ill-health; how they can be encouraged, unplugged, interfered with, replaced and mimicked. Over the last 40 years we have developed a toolkit for clocking these variations. But it turns out that some genes are more equal than others - in terms of the attention they attract from scientists, funders and Big Pharma. I've a neat Most Sexy Protein exercise to do with students which shows how some proteins have had more than 10x more interest than other, related, probably equally worthy, potential targets. I've been here before tribbing the work of Aled Edwards in Toronto for showing that the human genome project delivered [to a close approximation] NO new gene targets. As the HGP a) cost $3 billion of tax-payers money and b) was billed as helping understand our molecular/medical fundamentals; that's a bit of a bust. Scientists, for all their talk of pushing the far frontiers of science, are quite risk-averse: they tend to tweak and polish the same old same old system that they worked on for their PhD when they had all their hair and fewer children. One reason for this is that they follow the money, and funders are even more risk-averse than scientists. The chaps who make the funding decisions have a bean-counting monkey on their backs whispering "Remember, adjudicator, your decisions must bear fruit for the tax-payer".
The conservatism of science got hot 'n' fizzy in September because of a new massive bibliometric study of all the genes we know about in the human genome. Large-scale investigation of the reasons why potentially important genes are ignored was published in PLOS Biology by a small group from Northwestern U headed by Thomas Stoeger and Luis Amaral. I gather that it was Stoeger's original idea and he mobilised Amaral the Portuguese Megaquant (who has an office in each of The Institute on Complex Systems (NICO), Dept of Chemical and Biological Engineering, Dept of Molecular Bioscience, and the Dept of Physics and Astronomy at NWU) to help crunch the numbers, which quickly became formidable.
>!plooof!< of molecular evolutionary analysis in 1989, it was fairly calm. I was assigned to a) develop useful software for analysing some classes of DNA sequence b) apply that software to human genes. Only 1064 protein coding genes had been sequenced by Christmas 1990: I know; I collected them carefully. Those genes were cherry picked because they were interesting and analyzable: they had been tracked down because a mutation in their DNA was associated with a disease state. They were not, therefore, a random selection of the 20,000 genes which we now believe exist. The NWU team have flagged 15 basic bio-chemo- physico- attributes which are strongly associated with Interesting genes. That's 15 out of 430 (!) things that you can measure/record about a gene [see R for some of them graphically displayed]. They whittled the list to those key attributes with "gradient boosting regressions with out-of-sample Monte Carlo cross-validation" whatever that it. It seems that most of these measures are derived from street-light science: they could be measured with the techniques which were then available. They had to a) be expressed in bucket quantities b) across a wide range of tissues especially HeLa cells c) have signal peptides so that they were exportable from the cell for access d) tolerate non-fatal variants. If a gene scored strong in these features, it was likely to be discovered and characterised in the last century. In horse-racing parlance, those genes were racing along on the back of Eclipse first the rest nowhere. With that early start, funding-fondling and inertia [continue in its existing state of rest or uniform motion in a straight line, unless that state is changed by an external force] would kick in and a disproportionate amount of
the boss pulled a graduate student off her own project, where she was quietly minding her own business, and had her try to prove the existence and activity of the predicted gene in real cells in a test-tube. The quest consumed about 1.5 person years and came to nothing. I left that group in December 2012 and started work at The Institute in January 2013. The same week The Blob launched, a far better resourced research group scooped up the prize for discovering IFNL4. And while we're looking at Fig1a, check out C9orf72, it hasn't got a proper name yet [it's the 72nd Open reading frame on Chromosome 9] but it's very strongly associated with ALS [Lou Gehrig] and FTD frontotemporal dementia, and we have only a hazy idea of the how and the why. It was invisible in the 1980s and 1990s because it's not active in HeLa cells and doesn't have a signal peptide. Nevertheless it has huge potential for making money in the development of therapeutics.
Okay that was all good fun, but what to do about this ludicrous, wasteful, boring, use of science funding? Stoeger&Amaral have some [ain't gonna happen] suggestions:
- "In order to counter the career forces currently pushing towards conformity, there would be a need for stable, long-term support for such innovators to focus on the unknown".
- For gawd's sake keep up the basic research - in flies, frogs and nematodes - they have been a rich seam for identifying novel ways forward in human health.
- Reductionist science - where you control for all variable but one - is only sporadically successful in making progress through the complexity which is us and our habitat. Fund multi-gene science with interaction terms in the equation.
- Look carefully at the NWU data, it will help you identify the wall-flowers that could do with a dollop of dollars from NIH.