Tuesday 28 January 2020

Backup but Checkup

I was writing about the necessity / virtue of keeping copies of things in case one of them "makes like a wheel and rolls away".  This only becomes real when you are caught with your pants down and suffer a painful sting on the ass because you neglected this chore. Last week in class I was critiquing and correcting a Literature Review for one of my students. I am a strong advocate of using keyboard shortcuts in MS-Word: far quicker than piffling about with a mouse, let alone with a mousepad on a laptop. I was on the 3rd page when abruptly Word disappeared off the screen along with all my edits. Dang! But I'd only been at it about 10 minutes and could remember most of my suggested changes. Probably my fault: working on an unfamiliar keyboard I must have typed ctrl-S, ctrl+N or ctrl+Q. It took me 8 minutes to restore the status quo ante.

Back in the noughties, I was hired to work in one of the very first labs wholly funded by the new SFI - Science Foundation Ireland. Instead of piffling about with dribs and drabs of science funding the government decided to invest a Lot of money in cutting-edge scientific research. As a pilot study in how to spend the $pond$,, SFI solicited applications for five [5!] biotech research projects and five [5!] IT ditto. Ken Wolfe secured one of these prizes, was given some millions of €€€s and told he could hire the smartest people in the world who were happy to work in Ireland. Even with an unconstrained budget it was quite hard to get the money allocated in a way that would satisfy a government audit.
  • Personnel: The salaries were pitched very high, even for post-graduate students, but there was a limit to the number of seats that could be filled. Even a walking genius can't supervise 50 people working on 10 different sub-projects and there was a physical room with a finite number of electrical sockets. Those hired were indeed all super-smart but super-nice as well and it was a really productive fizzing place to be. I never figured out why I was hired but my imposter-syndrome died away as I started to get to grips with my tasks. Those who didn't die, did really well afterwards. And I think it's fair to claim that some amazing work was carried out.
  • Kit: everyone in the group was given a brand new high-end desktop dual-boot linux/windows PC . . . and a ditto laptop, so we could be productive on the bus. After 3 years we all got newer slimmer more gutsy laptops.
  • Expenses: one of the hires worked part-time as Office Manager and dealt with all the invoices, room bookings, petty cash and biros . . . as well as churning out a couple of papers a year with the boss.
  • Common core: part of the ancient Victorian office suite was partitioned off as a machine room, where a couple of the techy hires built the lab's own massive parallel server cluster out of Intel chip boxes [no I don't really know what that means, either]. There were layers of redundancy built into it so that, if one of the components failed, no data would be lost. 
  • Infrastructure: All these planet-sized brains, generating eye-wateringly large datasets, couldn't be expected to back up the day's work on a CD or USB key. No no, all the computers were backed up every Friday on a rotating set of DAT tapes and incrementally backed-up [all new changes that week] every night.
  • SYS$OP: one of the two techies, as well as churning out a couple of papers a year with the boss, was the designated Technical Manager fixing and advising on all the hardware issues and responsible for care maintenance and back-ups
Everything went along fine until two components failed in the server cluster and a chunk of data was lost. Not mine, praise De Lawd, but two of the post-grads lost weeks of data accumulation and analysis - >!poof!< gone without even a noticeable puff of smoke. But it's okay, said the Sys$op when he rocked up mid-morning [owls and larks were both tolerated], I'll restore everything from tape. But when he got there, the tape-cupboard was bare. All the tapes were there, but somehow an assumption had been made that they were all being written to each night. Indeed they had been  so written when the system was set up and checked but somewhere with the passage of time the current data hadn't been caught. I never wanted to find out the details of what went wrong, there were enough red faces without having to account to me. The corrupted cluster components were sent, at vast SFI expense, to a commercial data recovery services. But the discs were so badly corrupted that nobody wanted to trust the fragments of recovered code and those affected more or less started again from scratch. Quis custodiet ipsos custodes? - Who backs-up the back-ups?

1 comment: