Monday 2 September 2024

Stats laid bare

In April 2015, I was invited to An Event in Trinity College Dublin and had an interesting chat with one of the last-man-standing Professors of Anatomy. We compared notes on the several deficits in university education - driven off the curriculum by The New Sexy. The New Sexy is often fabulously expensive to develop and pricy to roll-out, so pennies need to be pinched elsewhere in the budget. Trad knowledge, like Anatomy, are undervalued and when the AnatProf retires he (almost always He) ain't replaced. So that class of chap is becoming an endangered species.

Same thing happened at the July 2024 Wexford Science Café which was a conversazione with Dr Sheila Willis, late Director of Forensic Science Ireland. When she started in Forensics in the 1980s the people there could do old style: 

  • microscopic paint chips could be found in a child's jumper and matched to the car of a boy racer
  • it was definitely Monaghan mud on the boot-heel and the pollen is oak
  • the blood spatter on the ceiling was [not] arterial
  • the hair was from the cat, definitely not the neighbour's aardvark

But the great god DNA has nudged a lot of this accumulated expertise into the dustbin of history and thrun the deer-stalker hat in after it. New hires tend to be from genetics and biochemistry rather than geology and ag. It's probably a sound judgment economically: paint-chip breakthroughs are rare but every perp is full of DNA; but having the framing always DNA is not good for expansive hypotheses. Dr Willis was innerviewed on The Life Scientific recently and the revelation that she'd been to school in Wexford Town secured her invite to the WxScCafé. The Blob has quite a lot to say of forensics.

We were invited to lunch recently (they only allow me out with my bib every ten years) and I rubbed shoulders with a different (and still in post) Prof of Anatomy. We agreed that a) scientific training was far too specialised and this was probably affecting a) creativity b) critical evaluation of data through lack of context. We also agreed that everyone, but especially scientists, cd/shd be much better trained in probability and statistics.

Was I a bit shiny-eyed & ranty about that? If so, it might have been my then current earbook: Naked Statistics: Stripping the Dread from the Data by Charles Wheelan. Wheelan has a faculty position, currently at Dartmouth and before that at U.Chicago. But he has been active in explaining math, data and money, not only to his students, but also on the radio, and in several newspapers of record. And there's more of that in Naked Statistics which uses jaunty, not to say facetious, examples to illustrate such arcana as the central limit theorem and the standard error of the mean. I thought these were pretty good explanations, and some of them were funny. ymmv, I guess. Everyone in the room I am in as I write (N = 3) knew that LeBron James is a tall American basketball player. But similar assumptions are made about the length of an inch, the weight of a pound, the significance of a "hole-in-one" and the function of a "pitcher". Maybe the author and publisher don't give-a-damn about bamboozling non USians. Maybe I am being patronizing about what my neighbours know and ignoring the ubiquitous penetration of American language and culture.

As with all vaguely technological non-fiction earbooks, aural processing is at a significant deficit if the words are either full of numbers or supported by pictures. Don't bother me none in this case: I've seen enough right-skewed distributions in 40 years of data processing to visualise what one looks like; I don't need to see the actual numbers to be believe that two confidence intervals do not overlap. But it is a bit tiresome, yet again, that publishers will take money by selling audio-books without providing these aids to understanding. For one thing it is effectively exclusionary of, say, dyslexics for whom audio is a much more efficient medium for knowledge acquisition.

But here's the thing: Wheelan gathers all the basics covered in the earlier chapters into a big puff for the value of multiple regression with dummy variables. He makes a convincing case that using this technique (readily available in your favorite stats package - yea even Excel) can yield interesting and unexpected insights into your data. One of the first, most challenging, and most rewarding courses I took in graduate school in Boston was Multivariate Statistics: sitting at the feet of Ralph D'Agostino. D'Agostino was also mad for multiple regression with dummy variables and explained that it was mathematically equivalent but much more obvious in its assumptions than ANOVA which is very widely used as a blackbox by people who know no better. D'Agostino died, in the fullness of his years, in September 2023. The obit reveals that he was, for 30 years, one of the Principal Quants in the famous Framingham Heart Study - a longitudinal [1948 - now] investigation into the effects of lifestyle, income, exercise, diet on the likelihood of cardiovascular events. The stats used there to effectively crunch through the data have changed our understanding of cause and effect; and implemented all sorts of policy changes, product labelling and drug development.

No comments:

Post a Comment