Monday 26 August 2019

Zippo = zero

My PhD thesis was essentially a Genetic-Geographic map of New England and the Canadian Maritimes.  I taught myself Fortran, on a room-size mainframe computer, in order to calculate Kidd and Cavalli-Sforza's Genetic Distance between each pair of populations. It turned out that there was a [positive but feeble] correlation between genetic and geographic distance; so that was a deliverable. I wanted to plot my gene frequency data on a series of  maps. People more techie than me were starting to develop digital mapping technology, but the cost was beyond our [zero] budget. I ended up xeroxing a number of blank outline maps of the region and typing in the gene frequencies for each location in approximately the right place. We did speculate about making an ascii-art map by writing a program to print a block of blank lines some of which would have a [gene frequency] number embedded. I knew the Lat&longitude of my sample locations and could scale that information to the blank text block, but I couldn't easily print a wiggly line of dots to represent the coastline, rivers or state/provincial boundaries. I thought about that for a good bit, but eventually submission deadlines loomed and I went for the inefficient won't-scale-up manual-typewriter solution.  That was in 1982.

By 1986, we had moved to England, rented for three years, renewed my 3 year contract and bought a house in Heaton, a slightly down-market suburb of Newcastle upon Tyne. That house was in a terrace large late Victorian red-brick homes NE6 5HR. At about the same time our mate Will got a programming job in London processing the demographic information associated with post-codes. He hacked into the company database to tell us that the housing stock on our street was one third each of owner-occupiers - council tenants - private rentals. From out own lived experience that had the ring of truth. Even back then, before Google [Larry Page and Sergei Brin were in primary school] and Amazon [Bezos had just graduated from Princeton], that data was valuable to commerce. NE6 5HR wouldn't have been the best target for a direct-mail-shot about Jacuzzis, even the double-glazing sales-people had an uphill struggle.

I've ranted about Ireland's EirCode debacle where each house has been assigned a unique ID which has been deliberately designed so that it cannot be grouped or consolidated geospatially. What is fantastically annoying is the EirCode, the company which secured the contract, is really slow in assigning Eircodes to newly built houses. For months Solas Bhride had to piggy back on the EirCode of the neighboring equestrian centre.

The British post-codes quite a bit better in this regard
NE = Newcastle upon Tyne and region
NE6 = One suburb, slightly downmarket from NE2 to the West
NE6 5 = one sector of that suburb
NE6 5HR = HR is a 'walk' assigned to an individual postman with his mailbag.
Our demographics [median income; single parent households; number of pensioners] on NE6 5HR were predictably similar to NE6 5HL immediately to the West. We need these data properly mapped to plan for schools, nursing homes, creches, supermarkets and pubs . . . as well as mail-shots.

I've had a look at US ZIP-codes as well; not very critically. On this thread in MeFi, a load of Geospatial Wizards agreed that ZIP codes were kinda useless for any purpose beyond delivering mail, despite the coding incorporating a hierarchical granularity similar to the Brits. It's worth reading for examples where average [arithmetic mean] income was a poor predictor of the people living in that code: the average income of Sergei Brin's neighbours is really high but they still can't afford a second car.
FilthyLightThief is on my page with this comment: Does it make sense to say "15.327 people will travel from this block to that new shopping center per day when it is completed, resulting in 12.732 additional vehicles on this road every day"? Particularly when you're using aggregated data and not even a travel survey. Even if the software spits that out, the humans reporting it should then say "approximately 15 people are expected to travel from A to B, resulting in approximately 12 additional vehicles traveling on this road every day," The original essay damning ZIP codes for geospatial studies shows a map [above R] of ZIP codes near Flint MI. The city boundary in green is the effective limit of lead-pollution in the drinking water; but only half of ZIP code 48504 is in the city. If you wanted to plan for brain-damaged children, the ZIP code would be a poor predictor.

No comments:

Post a Comment