Wednesday, 16 December 2015

Sexiest Protein Competition

Pubmed is a database of the scientific literature. A key element of science is endeavouring not to unwittingly reinvent the wheel by doing an experiment that someone has already carried out. Replicating another group's experiments by design is another matter and carried out less frequently than might be desirable. So before you launch your research project you should read the scientific papers that have appeared on the subject. This will stop you getting a red face from being seen to copy someone else's work as if your contribution was totally novel. Reading will also fill in the gaps in your knowledge, give you inspiration and food for thought and help you see places where you and your students can usefully make a contribution. But b'gob you cannot read every paper ever written: there are 26.7 million papers indexed in PubMed, with 1.15 million which came out this year.
You had better learn how to use Pubmed effectively so that a) you get to read, or at least scan, all the papers of interest b) you don't have to trudge through lots of irrelevant off-topic material to locate the jewels. Years ago, I wrote a manual called Better PubMed, and I've updated it periodically. At the end it points out some of the hilarious blunders that lurk in this all-compassing database like a number of papers which include both psuedogene AND pseudogene in the same Abstract.  You can't do anything about that but you can find
  • papers published out of Institutions in Waterford: waterford [AD]
  • the couple of papers published by Dr Mouse ignoring the couple of million papers published about The Mouse Mus musculus: mouse [AU]
  • papers published by Dr S Bob: Bob S [AU]
  • papers published in the noughties: 2000:2009 [PDAT]
I've been on about PubMed before insofar as it exposes a pernicious HarryPotterism at the heart of science. Aled Edwards in Canada has made a devastating analysis of this funding-fondling problem.  Scientists don't study what's important, so much as they study what other scientists are working on.  Some areas, some genes, some proteins 'get legs' and sweep all before them, leaving a lot of orphan genes weeping for lack of attention in the corners. How to encourage students on, say, a Masters of Imm course to find out how to use PubMed effectively?  Why, run a competition, of course! offering a small bag of Werther's Original butter candies.  I asked them all to bring to class the name of an immune Protein-of-Interest on which they would be carrying out their molecular evolutionary analyses.

The first step in any research project is to discover what the competition is doing . . . by reading the literature . . . using PubMed to open the door to these data. I suggested that we could look into the hypothesis that some proteins/genes were more "sexy" [as in hot current trendy] than others.
Q. How to measure that?
A. Count the number of publications about Protein "P"; then count the number that have appeared since, say, Jan 2014. Divide the latter by the former  et voila! you have a Sexy Quotient.

Protein PubMed Recent Sexy Qt Protein PubMed Recent Sexy Qt
NLRP3 2173 1024 0.47 CD47 870 157 0.18
IL28B 1249 462 0.37 CTLA4 5647 882 0.16
CD3g 39 12 0.31 ERBB2 22375 2691 0.12
NFKBIA 212 64 0.30 CD56 7874 941 0.12
IL8 2586 712 0.28 p53 78696 9351 0.12
NFKB1 781 195 0.25 iKba 69 8 0.12
MyD88 5031 1239 0.25 CCR5 8208 871 0.11
STAT3 14356 3518 0.25 CD154 7209 584 0.08
TLR4 13752 3345 0.24 EBAG9 164 8 0.05
RAG1 1417 273 0.19 recA 6351 260 0.04
I've sorted the chosen proteins by hotness, and there turns out to be an order of magnitude in the difference between Princess and Cinderella. Why might this make a difference? You can see that some widely cited proteins, like TLR4 and STAT3 are really going off the boil, while NLRP3 and IL28B are on the up-and-up.  If you have a choice, I suggest you are going to pull down more grant money and find it easier to publish in Nature if you devote your time to Sexy Proteins than tired old dowager proteins. 

No comments:

Post a Comment