Friday, 10 June 2016

Benford

Members of Gamblers Anonymous think otherwise, but the only sustainable way to make money from making bets is to leverage reliable information against the house. Sometimes this can be honest / legal and most people think that putting one over on Las Vegas casinos and even the local horse-and-soccer bookie is a victimless crime. Thus if you hear confidentially from a stable-hand that Running Equine had a visit from the vet and is, after all that fetlock trouble, back at the top of his form, then it is legit to stake your shirt on the 3.30 at Leopardstown. Your info is more up-to-date than the bookie's odds-maker. It's a bit dodgier if you bet with a five year old about whether it will rain on her birthday tomorrow when you have seen Met Eireann's weather forecast on the interweb but you know that she's all out of credit until her birthday. Making money on stocks and shares is supposed to be like the first case and there are sanctions against 'insider trading' where you use privileged / confidential information to bet on the share price falling or rising.  How financial traders square their information with their conscience is a mystery almost as profound as their sources of information. There is something particularly murky in betting that the share-price will fall because there are so many ways in which that can be engineered. Betting that a horse will not win and therefore winning yourself could be the result of all sorts of chicanery but it takes a miracle of converging threads for your horse to win.  It's a bit like Tolstoy's "Все счастливые семьи похожи друг на друга, каждая несчастливая семья несчастлива по-своему" translated as "All happy families resemble one another, each unhappy family is unhappy in its own way."

There is a certain group of people whom you can honestly take with the following suggestion "Go off and gather any set of numerical information about the World - populations of counties or countries; lengths of rivers; chemical molecular weights; street numbers of homes. You give me €1 for every such number that begins with a 1, 2 or 3; I'll give you €1 for every number that begins with 4,5,6,7,8 or 9".  Don't do this with 5 year old children unless you also mug pensioners outside the post-office. Try maths students, who will have sufficiently well-polished crap-detectors to smell a rat, but nevertheless may be gullible enough and greedy enough to think you've lost your marbles. The uneven distribution of these first digits is known as Benford's Law of Anomalous Numbers because it caused a stir when Frank Benford published the idea in 1938.  Simon Newcomb (Canadian-American astronomer, applied mathematician and autodidactic polymath!!) had pointed out one example of the Law in 1881 - as it applied to logarithms. He was induced to investigate the phenomenon because he'd noted that the earlier pages of his book of logarithm tables were more 'worn' than the later ones. Here 'worn' means grubby, dark and impregnated with sweat, sebum and a host of supported bacteria <eeeuw!>.

Benford built on Newcomb's finding and looked at a wide range of numerical data and found that many of them fit a distribution approximated by
P(D) = log (1 + 1/D)
showing the probability of any leading digit D is related to the log of its reciprocal. What this means is that you expect 1 [P = 0.30] to be 6-7x more common than 8 or 9 [P = 0.05].  Obviously not all natural number sets will follow the pattern. There are only a few Grandmothers who are over 100 but lots in their 60s-90s, for example.  This may/shd remind you of Zipf's Law as applied to English words [the, of, and and are super common] and to letter frequencies: in English the commonest are etaoinshrdl[u|c] - jury is out on c and u.

Another distribution that follows a not Newcomb / Benford / Zipf distribution is birth-dates.  They are uniformly distributed through the year and there is no evidence or suggestion that more people are born on the 1st of the month more than the 9th or the 31st. So it would require an exceptional dense and ill-informed person to take you up on your 1-3 vs 4-9 offer - that are no birth-dates that begin with 4-9! I used births - of "you and all your family and significant others" to have my Yr2 Quantitative Methods class to generate a 'flat' distribution that was clearly not Normal / Gaussian / Bell-shaped.  There are some subtleties in the pattern of birthdays through the year which I've investigated before.

Funnily enough, although it is so widely distributed, for a long time there wasn't really a good explanation or proof of Benford's Law. 60 years after his 1938 paper, TP Hall published a neat explanation in American Scientist - which is behind a Sigma Xi paywall.  Some of that is abstracted in a blog-post by Laura McLay and check out the comments of her post too, there's an interesting twist.


No comments:

Post a Comment