Thursday 3 September 2015

Can you do it again?

The answer is, all too often, "No".  The question is: can you reproduce that nifty finding about how the world ticks?  The question is a summary of what all science is about: a) finding out how the world ticks and b) giving the answer a damned good clatter about the ears to see if it stands up to criticism. A couple of years ago I gave tribs to Brian Nosek for being so pleased about his student's clever result that he made the poor chap do the research all over again . . . and find that his 'significant' result was just a random up-blip in Nature's Noise.  It's the kind of bummer that doesn't happen very often: most principal investigators send 'good' results in for publication and let someone else try to replicate the results. Most labs prefer to be making their own discoveries at the frontier rather than testing some other group's results: so retro, darling, and the other chaps might take it personally.

Professor Nosek was in the news again last week. He has coordinated a huge study, The Reproducibility Project, in experimental psychology to see how many results published in three leading peer-reviewed psychology journals could be replicated. If you can get your stuff published in Psychological Science, or Journal of Personality and Social Psychology, or  Journal of Experimental Psychology: Learning, Memory, and Cognition, then you'll be very happy: their impact factors are in the top half for the field as well. It's basically a huge scaling up of what Nosek put his own people through two years ago.  The meta-result is a worrying exposure of how much utter tosh gets past the researchers themselves, an editor, and two or three peer reviewers. It suggests that scientists are no less credulous and prone to wishful thinking than shop-assistants, architects or footballers.

What they did [actual Science paper] was take 100 psychology studies published in those three leading journals in 2008. They being Nosek and 269 (!) co-authors who agreed to critically re-evaluate the results by attempting to replicate the earlier work. A key part of any scientific paper is the section Materials and Methods which should ideally give enough information about who did what to whom, so that anyone can see if they can obtain the same result by following exactly the same protocol. Science is littered with me-too studies which re-do a successful study . . . with boys rather than girls; rich people rather than poor; testing for religious rather than political bias. That subtle change makes The Effectives think they're doing something novel. But then you can never be sure if any discrepancy in the result is due to the newly introduced variable or if the original study was suspect.

97/100 of the original papers found something significant about how people tick [people being a subset of the world whose ticking is what interests science].  Such important issues [titles follow] as <read/scan the list, there are points about it which are followed up afterwards>:
  • Power, distress, and compassion: turning a blind eye to the suffering of others.
  • Powerful people make good decisions even when they consciously think
  • Gender recognition of human faces using color
  • With a clean conscience: cleanliness reduces the severity of moral judgments [which we found last year was probably or 50:50 nonsense]
  • Contextualizing change in marital satisfaction during middle age: an 18-year longitudinal study.
  • The case of the transmogrifying experimenter: affirmation of a moral schema following implicit change detection [wtf can that be about? probably nonsense aNNyway]
That shows the all-prevading effect of publication bias: only [well 97%] statistically significant sexy science gets published.  If you and your students find no correlation between X and Y, then the study almost certainly gets left on the cutting room floor.  Which has the unfortunate effect of allowing some other poor sap do the same or similar work because s/he doesn't know about your unpublished negative result. But the Reproducibility Project could only get 36/100 of these studies to show a significant effect: same study with different people and your clever demonstration of how power corrupts disappears in a puff of smoke! Nosek and Co. further found that the magnitude of these psychological effects collapsed in the replications. The big song and dance of the earlier study gets <satire>transmogrified</satire> [I swear I will never use this word again! "changed" will do fine] into something statistically significant but effectively trivial.

John Ioannidis, whose 2005 PLOS Medicine paper Why most published research findings are false put the cat among the pigeons, identifies a worrying mathematical likelihood in Nosek's finding: If the replication rate is so poor in the top journals what can it be down among the grass of the publication jungle? Because we all know that if you have several hundred $$$ for 'page-charges' you can get any old nonsense published somewhere.  Possibly in the journal that recently sent you an unsolicited e-mail beginning "Esteemed Professor".  Another issue is that the 100 2008 studies were chosen because they could be replicated easily: the original authors were prepared to share their data and protocols, the Mat&Meth were clear and it wasn't an 18 year longitudinal study. More involved/complex/time-consuming previous studies aren't going to be more reproducible are they?

There is no place for other scientific disciplines being smug and saying that psychological research is a contradiction in terms and who gives a damn anyway? Let's look at our own practice v e r y  c a r e f u l l y before we start slagging off other people.

As ever, The Atlantic covers the story with clarity and sense. And hats off to Brian Nosek who managed to herd a lot of cats in a useful direction.

No comments:

Post a Comment