## Monday 11 May 2015

### Vital Statistics

At this year's SciFest were 54 projects in which young scientists investigated some quirk of the natural world, which is the very essence of science. But many of these investigations had tiny samples and no replicates.  What does that mean?  If you have three or four experimental conditions, you need to carry out the experiment several times in parallel for each condition otherwise you cannot be sure what variable is causing the change and indeed you cannot be sure if there is change/difference at all rather than a random blip.
Suppose you have a theory that metals alter the growth of plants. You can grow plants on plates of gold, silver, copper and bronze and, after a while, you can measure each plant. Say that the gold plate has the biggest plant, you can't be sure if it's because of the metal or that it was nearest the door (draught) or nearer the window (light) or on top of the radiator or if a mouse had taken a leak there in the night (it's the nitrogen).  Whereas if you have 4 replicates of your four experimental conditions you can arrange them in a Latin Square [R] and then see if gold has bigger plants regardless of which row and column it is in.  That makes it a bit more difficult to generate a No/Yes Black/White On/Off answer but there are standard statistical tools for analysing such blocks of data.

It takes a while to set up such a project and it has to run for at least a couple of weeks, so when it comes to measuring the plants, using a 30cm rule, you may as well measure 30 of them as 3 - that takes 30 minutes rather than 3 minutes but what differ in a project that has run for 3 weeks?? A single measurement, or three, is anecdote while N=30 is data. Obviously if the measurement is the 'bottleneck' in the protocol you wouldn't do this many.  Say the thing you are analysing requires a biochemical kit rather than a tape-measure and each kit costs a week's pocket money?  Clearly then you'd carry out fewer measurements but you'd still have to do replicates!  Better to do 2 gold vs 2 silver rather than one of each of four metals.

But it's probably better to do no measurements than too few.  Under-powered experiments are those which, when it comes to the final analysis, cannot with certainty answer the question one way or the other.  This is a pervasive problem in science, which is driven by enthusiasts and reined back by bean-counters.  You can, and funders now insist on, carrying out a power-analysis before you even start the project.This requires you, from a pilot study or by waving your arms vigorously, to estimate the effect size: the likely difference between your cases and controls.  G*Power, or any of several alternatives, will then tell you how big your sample size has to be in order to find a significant difference or demonstrate unequivocally that there is no difference.  Both those outcomes are valuable, but scientists really only want the former because it proves that their hypothesis is correct. So far, so idealistic.  What happens in the real world is that, if you put in realistic effect sizes, the software says that you need an astronomical number of measurements - which would take you ten years to accumulate and cost \$10million in salaries and consumables.  The effect-size and the other power-analysis inputs are accordingly massaged to come down to what is possible in a three-year period using whatever financial ceiling has been set by the funding body.  The funding body is equally culpable because they have a finite pot of gold and want to dole it out as widely as possible. If all the money went to Professor Starr at Stellar University, the tax-payer and their elected representatives would want to know why.  This Buggins' Turn policy has flushed more money down the toilet than Joe Public would be happy about.  The consequence is that a lot of under-powered (read half-arsed) projects are funded, marginal results are obtained, uncited papers are produced and young scientists are implicitly trained to believe that this is a good thing: it did after all get them their PhD.  It seems I've said all this before last year, but the problem hasn't gone away, so I've said it again slightly differently.

Here's another thing, our students are mostly bloody woeful at appropriate statistical analysis.  You cannot do science, especially biological science with its noisy, relentlessly variable data, without being competent in statistical analysis.  No finding has credibility unless it has an associated probability and the data can be displayed with error bars showing the range of variability in the measurements as well as the mean/average value. In The Institute, we have on the Faculty two perfectly competent quants and they teach math and probability and statistical techniques incrementally in first, second, third and fourth years and the students are still bloody woeful at stats.  I'm now getting convinced that it's because 'statistics' is taught as a self-contained module and mentioned nowhere else in the curriculum.  The examples used in the stats modules are wholly artificial and/or irrelevant to the rest of the course. Nul points for communication, us.

This is unGood and please tell your science policy wonks. We require literacy in our schools, not least because most of the class (N=30) has to be reading while the teacher deals with the bad-boys at the back.  But numeracy is not so essential to process of 'education' that goes down in school. If statistical analysis could be embedded in the teaching of science at school it would make life a lot easier when the kids come to college. It's a bit fatuous to teach it as part of maths and talk about throwing dice and taking cards at random from a deck. There are now far more exciting games to play than whist and rummy and kids don't roll dice any more. Possibly even the Dungeons & Dragons geeks don't roll funny shaped dice anymore - they just press the random number button on the iPhones.  I don't think there is anything more important for developing an informed and questioning citizenry than making children aware of the concept of statistical significance. They'd be able to expose the nonsense that gets trotted out every day in the newspapers. It's not that difficult: much easier to grasp, if taught engagingly, than French irregular verbs. Actually, if you casually brought statistics into the study of French irregular verbs, the kids would have less trouble with mathsemantics later on.