## Wednesday 27 February 2019

### Checksum tags

Where sheep tags meet combinatorial maths.
I'm a bit of a groupie for David Brailsford on the Numberphile channel. He's a bit older than me, so was an adult at a time when computers were just starting to gain traction as useful tools - even if they were the size of a walk-in cold-room and cost £1 million. Getting them to do your bidding was hard and you had to really understand how they worked to have them spit out anything; let alone the correct answer to your data-crunching problem. Somewhere in his noddle, Brailsford knows where all the bodies are buried and who did what to whom in the 1960s and 1970s when computer-literacy was a rare accomplishment. He knows Kernighan and Ritchie who wrote The Book. A couple of days ago, Brailsford was talking about Reed-Solomon encoding, and most of it was over my head. But Brailsford kindly pointed out that Reed-Solomon was a turbo-changed sort of Hamming codes, which I do kind of understand. Although, for many years I thought they were something invented by wireless hams to ensure correct transmission over shortwave radio. What's a belief that sounds like a bell? <Wronnnnng!> they were invented by Richard Hamming while working at the mighty Bell Labs with the likes of Claude Shannon and indeed, Ritchie and Kernighan and nine (9!) Nobel Prize-winners. Hamming codes work by transmitting a little bit more information and being able to thereby deduce if the data has been transmitted correctly . . . or at least with internal consistency.

Another way of doing this is to use checksums. A number to be transmitted is run through a little program which computes (and appends) an extra digit which is unambiguously associated with the original number.  Any message to be transmitted can, of course be converted to numbers, and indeed nowadays almost always is so converted with ASCII or Unicode schemes. On arrival, the incoming message can be checksummed and if everything matches we can be confident that the message was sent correctly: ie. without getting partly fuzzed out by an electrical storm or mangled by an incompetent at the keyboard.  For several years in the 1990s, this sort of thing was bread-and-butter to me because I'd be dealing with DNA and protein sequences, some of ridiculous length that would be hard to verify as being free of typos. I remember downloading a very large file of DNA sequences and their annotation when the internet was new and flakey. It arrived, I started to analyse it in good faith, and it was a couple of days later that I realised that none of the sequences had names beginning with T U V W X Y or Z. The last section of the file had been dumped out on the floor of the Atlantic as a "broken pipe". Start again, lads! If those data had been checksummed I wouldn't have wasted hours of time analysing an incomplete dataset.

Q. Ah the poor petal, what brought all that on, like a bad dream?
A. The sheep tags is what!
Friday 22 Feb 19 was the day assigned to ultrasound [multiprev] the ewes for lambs. This is a really handy thing to do because the number of lambs carried should dictate their feeding regime in the last months of pregnancy. It also helps tell you to look for an extra lamb if you wake up to find that the ewe scanned as having twins only has a singleton at foot. Sometimes the missing one is still inside, sometimes still-born and on at least one occasion really quiet in a corner. Sometimes you just have to put the MIA down to the fox. We were pretty sure that the sheep we ran up the mountain last Summer had been with the ram. While we waited for the Ultrasound guy, Paddy the Shear came to help us with the dosing [fluke, lung-worms / hoose, Cryptosporidia] and pedicure <snip, snip>.  At some stage I asked Paddy if he thought any of the ewes looked pregnant. As reflex, his hand snaked under the nearest sheep to palpate the udder. The continuing conversation was revelatory: Paddy was extremely skeptical that any ovine bonking had been happening on the mountain - it just isn't done to allow random males to be romping around up there because it screws up everybody's schedule of lambing. Lambing is a time of serious sleep deficit for shepherds and they want to limit it to a window of not more than 2 weeks . . . 5 months after the ewes are put to the desired ram. Red face - it looked like we were going to have the ultrasound guy on a wasted journey. Then again, the new intrusive Dept Ag regulations require that flock owners get their ewes scanned as an animal welfare issue - it's no fun for a sheep if she needs help with delivery and cannot get a [small] helping hand.

My job in the dosing and drenching was to record the ear-tag numbers as they went through the process and, like a good secretary, add notes like "loose teeth"; "scald - 5ml betamox"; "too frisky = to factory"; "missing ear-tag". As I did so I noted that there was some semblance of order in the numbers - 05396-07806 for example came in a batch, as "mountain-hardy", from our neighbour Martin. And twig this: Martin must have had 7806-5396 = 2410 sheep through his farm gates. But each number was followed by a random letter between A and J - must be a checksum I thought. Excluding I 'eye' because it might be confused with 1 'one'; then A to J could be code for numeric digits 1 -9.  Do you notice how 0049C is exactly nine higher than 0040C? Maybe the letters run a not-alphabetical cycle repeating after every 9th number?  That's a hypothesis . . . shot down in flames because 0046 is not followed by G like 0037
 0037G 0048A 05901D 0040C 0049C 06152E 0041E 0050D 06356H 0043J 0055E 07802E 0044B 0133F 07804F 0045D 01495J 07805C 0046F 01497D 07806E 0047H 05396E 11637J
So there's a puzzle, and a challenge. It's like trying to decipher the Indus Valley script from fragmentary data. I have a box of unused sheep-tags that will be a larger and less fragmentary sample.