Wednesday, 7 October 2015

Coming into season

We're shortly going to be repeating the cycle of life with our ewes. The gestation period is more or less 5 months / 22 weeks / 150 days. The start of sheep pregnancy doesn't take much effort on the farmer-side. All that is required is a borrowed ram. He'll work out which of the females are ovulating, tupping will occur and 5 months later lambs will make their appearance. It is important to keep tabs on when the tupping occurs because you don't want to be on holiday in Tenerife when the lambs are born. You might think that it is best to choose the delivery date for the middle of the Summer when the days and the grass are both longest, and it is warmer and drier too.  But it turns out to be effectively impossible to have lambs delivering outside of a window between the end of December and the beginning of May. Female sheep are only fertile between end of July and Christmas.  If you have a lot of sheep, their appearance forms a neat Gaussian/normal/bell-shaped curve around the average date: things start off slow, reach a crescendo of activity and sleepless nights and then trail off again.
Turns out there is a faint trace of this cycle in the pattern of human births with a peak in August and September and fewer birth in other months. The distribution of births through the year is mostly 'flat' rather than peaked in March as with sheep. But there is a statistically significant up-blip in late Summer peaking in September.  I've pushed at these data before noting that in America there is a helluva lot of elective C-section and so choosing of 'convenient' birthdays like not-Christmas; not-4thJuly. But that won't allow anyone to shift large numbers of births between months. So the up-blip must be due to more successful tupping in the Christmas holidays. As you know, in human sexual relations it takes two to tango: so it might be that, with office-parties and drink-taken, there is more sexual activity. But I think it's also likely that conception is more likely to occur which results in a late Summer birthday - hormones are involved. That should surely be factored in for couples who a trying to conceive.

But there is no point in dreaming up an imaginative and internally consistent story to explain why we have such a situation, if the data shown above are an atypical aberration.  First step is to see if the pattern is replicated with another dataset.  That's what I'm pursuing this week with my hot-shot second-year strength & conditioning class.  In Ireland the repository of statistical data is the Central Statistics Office aka PhrĂ­omh-Oifig Staidrimh and I hopped off there for a table of births by month for the six years 2007-2012 inclusive.
Month 2007 2008 2009 2010 2011 2012
Jan 5,611 6,145 6,219 6,351 6,045 5,985
Feb 5,006 5,688 5,605 5,514 5,649 5,753
Mar 5,812 6,165 6,316 6,351 6,297 6,061
Apr 5,598 6,124 6,230 6,016 5,977 5,783
May 6,124 6,519 6,308 6,280 6,257 6,135
Jun 5,885 6,217 6,388 6,325 6,221 5,923
Jul 6,251 6,612 6,727 6,451 6,468 6,305
Aug 6,520 5,904 6,362 6,162 6,377 6,110
Sep 6,355 6,303 6,439 6,563 6,408 6,007
Oct 6,239 6,396 6,361 6,507 6,165 6,111
Nov 5,960 5,887 6,181 6,255 5,909 5,818
Dec 6,028 6,036 6,418 6,399 6,260 5,683
That is rather noisy so needs some tricking about to show clearly any trends.
First off, there are only 28 days in most Februaries, and 31 in other months, so you'd expect about 10% more births in Jan than in Feb and indeed the lowest monthly total N=5006 is a February.  Then, although these are substantial numbers you can smooth out some of the noise by adding up all the Januaries etc. between 2007-2012.  Between us, we manged to agree that the handiest way of displaying the data was to sum up each month across years and then divide by the number of days in each month [See L].  This gives an average number of deliveries each day which is handy for planning how many midwives to employ to fill 3 shifts x 8 hours = 24hrs coverage.  This comes down to about 200 births per day across the whole country.  But they aren't uniformly distributed across the year and there does appear to be a July-Sept peak . . . except that August doesn't fit the trend.  In the books Lies, Damned Lies and Statistics, Michael Wheeler has a whack at people exaggerating peaks in date by top-slicing the histogram as I've done here.
It's fairer [as above] to show the whole length of the columns so that the variations are seen to be ripples rather than chunky irregularities.
Another way of fudging the data is to stop early.  If you had the idea to replicate the US September surge in early 2008, you'd ask the CSO for the most recent complete year, tally that up, adjust for month-length and top-slice the histogram [L = !Tarrraa!].  That's why Brian Nosek and others insist on remaining calm when you get an interesting result . . . and repeating the experiment with a different, preferably larger, dataset.  If you have Excel or similar, and the necessary curiosity, you might get equivalent data from the our country and send in a comment. One of the few things I remember from my own undergraduate career was taking part, as a cog, in a real-science experiment. I hope that there might be one student in my S&C class who will mention this week's exercise in 2035.

No comments:

Post a Comment