I was tribbing Darrel Francis and his team for caring enough about their field of science (cardiology) to re-analyse a huge bouquet of Other Men's Flowers - and finding many of the blooms a bit whiffy. On three occasions, I've had cause to look really carefully at someone else's data, and in each case I've found it wanting. Now, it may well be a case of Ascertainment Bias - I can't remember the many many times when such data stood up to my scrutiny.
For several years in the '00ties, I was in charge of the weekly journal club. Each of us in turn would pick a scientific paper relevant to our field of interest, we'd all undertake to read and critically evaluate that paper and discuss our findings. Well, it was often pretty sad, because there were frequently only two people in the room (me and the presenter) who had made time to read the paper. Although the boss would read it on the fly and usually come up with some ideas. You don't learn much by sitting there passively listening to a colleague giving an executive summary with powerpoint. Eeee, in my day, when I were a young graduate student we had the same sort of event, but my memory of it is a hammer-and-tongs shouty-match to see who could find the next laughable error in a paper published in a top-flight journal. It was rare indeed that a week went by without the authors trying our patience by doing a t-test on a sample of 2 x 3 replicates, or having a a graph without dimensions on at least one axis, let alone having a citation in the text without a reference in the bibliography. Sure we were young turks out to prove how clever and applied and dedicated to the task we were. But I tell ya, we were in the ha'penny place when we went across the river to similar events in Harvard. The graduate students there ate each other, we only nibbled at total strangers. This told me that the system of peer-review is flawed; not fundamentally flawed, but you have to worry that if a referee or two or three misses the kind of errors which we found on every trip out then they are possibly missing something bigger which only their experienced eye would be able to detect. We are complacent about the peer-review publication process in the same way that Churchill was about politics: "Democracy is the worst form of government, except for all those other forms that have been tried from time to time."
When I came back to Ireland in 1990 after a decade seeking my fortune in foreign parts, I got a job in the world of bioinformatics and molecular evolution. I'd been amateuring at the field for a couple of years in England but now I had to deliver. My boss was one of a small handful of scientists who were interested in synonymous codon usage. It's quite obscure but works on the fact that several different triplets of DNA code for the same amino acid when they are translated into protein. You'd expect (null hypothesis) that each of these options would be used uniformly or randomly, but in fact some codons in some proteins in some organisms are used far more, or less, than you'd expect by chance. My task was to assemble comprehensive datasets of all the genes from a species and run them through a set of computer programs to decipher what pattern was followed by that organism and how did it compare and contrast with other species. It meant assembling large tables of genes vs codons counting the instances of each possibility: Gene ACT1 had 14 UUU codons, 30 UUC codons etc.
We got a bit proprietal about it; trying to put one over on rival labs (mostly in France) and doing the most careful analysis we were able for. One day, a paper appeared doing our sort of analysis on a species that we hadn't gotten round to knocking off - yet. Worse, it was from a total outsider, who was a bit of an expert about the species in question but, clearly, knew bugger-all about how to analyse the data. In other words, he hadn't cited any of our papers - harrrumph! I was assigned to read the paper properly and I did it as a referee should but usually can't find the time to do. First off the numbers in each column didn't always sum to the column-total and the sum of the rows didn't agree with the sum of the columns. The numbers weren't wildly off but they made us question the quality of the analysis that had been carried out on such numbers. We decided that we'd have to put the record straight and I spent the next tuthree months dragging data out of the literature and from the DNA databases and re-did the analysis in the correct (i.e. our) way. As a courtesy to the original authors we sent them a pre-print of the analysis just before we sent our manuscript off to a journal. A sad response came back from the Principal Investigator saying that the whole escapade had been the result of a summer internship that had finished prematurely; the intern had disappeared back home and he, the PI, had scrabbled the paper together before he got really busy for the upcoming academic year . . . and please could we not be too scathing in our discussion. We weren't, but used the sort of British understatement that is more devastating than shouting that the fellow is a cad and a bounder.
My next project was to construct a phylogenetic tree of bacteria based on protein sequences. In those days as now, the relationships among bacteria are determined by comparing the sequences of one of the structural RNAs (usually 16s) in the ribosome. So it was of interest to see if proteins were evolving in the same pattern and/or at the same rate as the 16S nucleic acids. I assembled a dataset of the protein recA that was most widely sequenced at the time and used as the de facto global standard software suite called Phylip to do the analysis. Phylip - the phylogeny inference package was the brainchild of Joe Felsenstein, who had created it and made it freely available. I really wrestled with those data, because I'd never done anything like that before and I didn't want to have a red face when the paper was published. I checked everything for internal consistency and, in the process, found a tiny bug in the code of one of Felsenstein's programmes: in a particular, peculiar set of circumstances the code didn't sum the numbers correctly. It didn't substantively affect the results but I wrote to point the error out and became one of an exclusive club of 385 people who have helped inch that project forward in the right direction.